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1.  Introduction 


The  tremendous  engineering  advances  made  in  Very  Large  Scale  Integration  (VLSI)  fabrication 
technology  have  stimulated  considerable  theoretical  interest  in  VLSI  circuit  layout  problems. 
Most  of  this  effort  has  centered  on  minimizing  the  layout  area  of  a  circuit  on  a  chip.  This  is  due, 
in  part,  to  the  fact  that  layouts  which  consume  large  amounts  of  chip  area  are  more  expensive 
to  fabricate,  less  reliable  and  harder  to  test  than  layouts  which  consume  smaller  amounts  of  chip 
area. 

Other  layout- related  issues  that  have  been  studied  include:  minimizing  propagation  delay 
(either  by  decreasing  wire  lengths  or  by  increasing  transistor  sizes),  minimizing  the  number  of 
wire  crossings  in  a  layout,  producing  regular  layouts  for  gate-arrays,  designing  chips  that  can 
later  be  configured  to  realize  a  large  number  of  circuits,  configuring  networks  around  defective 
cells  on  a  wafer,  and  assembling  large  systems  of  processors  from  copies  of  a  single  basic  chip 
which  has  few  external  pin  connections. 

Most  theoretical  techniques  devised  thus  far  are  based  on  the  divide-and-conquer  paradigm 
and  require  the  use  of  a  separator  theorem  to  recursively  partition  a  given  circuit.  Although 
separator-based  techniques  work  well  for  some  graphs,  they  perform  very  poorly  for  others. 

In  this  paper  we  propose  an  alternative  framework  for  solving  VLSI  graph  layout  problems. 
Like  previjus  approaches,  the  new  framework  is  based  on  the  divide-and-conquer  paradigm. 
Instead  of  using  a  separator  theorem  to  recursively  partition  a  graph,  the  new  framework  requires 
the  use  of  a  bifurcator.  The  difference  between  bifurcators  and  separators  will,  of  course,  be 
explained  in  the  paper,  but  the  two  primary  advantages  of  bifurcators  over  separators  may  be 
stated  here.  First,  unlike  separators,  bifurcators  may  be  efficiently  computed  using  either  a  good 
graph  partitioning  heuristic,  or  from  a  layout  with  small  area.  Second,  bifurcators  can  be  used 
to  produce  layouts  that  are  efficient  in  a  variety  of  respects,  not  layout  area  alone. 

For  example,  using  the  notion  of  bifurcators,  an  area-efficient  layout  can  be  transformed  into 
a  layout  which  is  both  area-efficient  and  also  has  small  propagation  delay.  The  same  result 
can  also  be  achieved  if,  instead  of  an  area-efficient  layout,  we  use  an  efficient  graph  bisection 
heuristic.  Separator  theorems  are  inherently  weaker  than  bifurcators  for  such  purposes,  and  no 
other  approach  is  known  to  enjoy  the  versatility  of  bifurcators. 

This  paper  is  based  on,  and  unifies  the  work  contained  in  three  extended  abstracts  by  Bhatt 
and  Leiserson  [3,  4]  and  Leighton  [21 J.  Although  the  results  are  self-contained,  some  familiarity 
with  recent  results  in  VLSI  layout  theory  would  be  helpful  in  reading  this  paper.  A  fairly 
comprehensive  list  of  recent  research  papers  is  included  in  the  references.  In  particular,  Ullman 
[43]  provides  a  good  introduction  to  issues  in  VLSI  layout  theory. 

The  paper  is  divided  into  nine  sections.  In  Section  2,  we  review  the  layout  model  and  the 
separator-based  approach  to  VLSI  layout.  In  Section  3,  we  formally  state  eight  VLSI  layout 
problems  and  briefly  review  the  progress  made  on  each  problem.  The  combinatorial  lemmas 
proved  in  Section  4  provide  the  basis  of  the  new  framework  described  in  Section  5.  In  Section 
6,  we  describe  how  the  framework  can  be  used  to  efficiently  solve  the  eight  layout  problems 
described  in  Section  3.  Section  7  showB  how  a  good  graph  bisection  heuristic  can  be  used  to 
produce  a  provably  good  layout  strategy.  In  Section  8,  we  prove  that  the  upper  bounds  for  area, 


crossing  number  and  minimax  edge  length  found  in  Section  6  are  existentially  optimal.  The  paper 
concludes  with  some  remarks  and  open  questions  in  Section  9. 


2.  Background 

Thomspon  [41,  42]  provided  the  first  formal  model  for  VLSI  circuit  layout.  The  model  is 
simply  stated  and  captures  the  important  aspects  of  layout  problems  in  a  realistic  manner.  A 
brief  description  of  the  model  is  included  in  Section  2.1.  In  addition,  Thompson  also  proved 
some  elementary  upper  and  lower  bounds  on  the  area  required  to  lay  out  an  arbitrary  graph, 
which  are  discussed  in  Section  2.2.  More  general  bounds  were  obtained  later  by  Leiserson  [26, 
27]  and  Valiant  [45],  who  independently  developed  a  divide-and-conquer  layout  strategy  based 
on  separator  theorems.  Section  2.3  summarizes  their  results  and  highlights  a  major  deficiency  of 
any  layout  scheme  based  on  separator  theorems. 

2.1.  The  Layout  Model 

In  order  to  cast  VLSI  layout  problems  within  a  mathematical  framework,  Thompson  [41,  42] 
developed  a  formal  model  for  VLSI  graph  layout.  The  model  is  based  on,  and  is  consistent  with, 
the  VLSI  design  rules  established  by  Mead  and  Conway  [31].  It  is  also  similar  to  the  widely  used 
Manhattan  wiring  model.  In  the  Thomspon  grid  model,  a  layout  for  a  graph  is  characterized  as 
an  embedding  within  a  two-dimensional  grid.  A  two-dimensional  grid  is  a  collection  of  horizontal 
and  vertical  tracks  spaced  apart  at  unit  intervals.  A  layout  for  a  graph  G  is  specified  by  an 
embedding  which  assigns  nodes  of  G  to  points  in  the  grid  where  horizontal  and  vertical  tracks 
intersect,  together  with  an  (incidence-preserving)  assignment  of  the  edges  of  G  to  paths  in  the 
grid.  The  paths  of  the  layout  are  restricted  to  follow  along  grid  tracks  and  are  not  allowed  to 
overlap  for  any  distance  (although  a  vertical  path  segment  may  cross  a  horizontal  path  segment). 
In  addition,  the  paths  may  not  cross  nodes  to  which  they  are  not  adjacent.  For  obvious  reasons, 
we  restrict  our  attention  to  graphs  in  which  no  node  has  degree  greater  than  four.  As  an  example, 
Figure  1  shows  a  layout  for  the  complete  graph  on  four  nodes. 


1  1 

.  1 

l 

1 

1  1 

Figure  1.  A  layout  with  area  IS. 


Figure  2.  Every  N -node  graph  can  be  laid  out  in  0(N 2)  area. 


Remark.  The  results  in  this  paper  easily  extend  to  variants  of  the  Thomspon  grid  model.  For 
example,  graphs  with  bounded  valence  greater  than  four  may  be  laid  out  by  mapping  each  node 
to  a  region  of  the  grid,  instead  of  a  single  grid  point.  The  results  are  also  applicable  to  networks 
with  large  processors.  Techniques  for  dealing  with  large  processors  are  described  more  fully  in 
the  discussion  of  Problem  5  in  Sections  3  and  6. 

2.2.  Elementary  Bounds  on  Layout  Area 

Although  there  are  a  variety  of  important  engineering  considerations  in  choosing  one  layout 
for  a  graph  over  other  possible  layouts,  the  best  understood,  and  perhaps  the  most  desirable 
cost  measure  to  minimize  is  layout  area.  The  area  of  a  layout  is  most  naturally  defined  as  the 
area  of  the  “bounding-box”  around  the  layout,  and  equals  the  product  of  the  number  of  vertical 
tracks  and  the  number  of  horizontal  tracks  that  contain  a  node  or  wire  segment  of  the  graph.  For 
example,  the  layout  of  Figure  1  has  area  15.  This  is  not  the  minimum  possible;  there  is  another 
layout  with  area  9. 

How  much  area  does  an  N-node  graph  require?  Clearly,  the  area  cannot  be  less  than  the 
number  N  of  nodes.  On  the  other  hand,  by  embedding  nodes  at  equally  spaced  intervals  along 
a  line,  and  using  a  distinct  horizontal  track  Tor  each  edge  (as  shown  in  Figure  2),  it  is  clear  that 
the  area  required  for  an  N-node  graph  is  no  greater  than  0(fV2).  These  bounds  are  independent 
of  the  structure  of  the  graph  and  hold  for  all  IV-node  graphs.  In  general,  however,  the  minimum 
area  needed  to  lay  out  a  graph  depends  on  the  graph. 

Thompson  [41,  42]  identified  bisection  width  as  an  important  property  of  graphs  that  affects 
minimum  layout  area.  The  bisection  width  of  a  graph  is  the  minimum  number  of  edges  which 
must  be  removed  from  the  graph  in  order  to  disconnect  it  into  two  equal-size  pieces.  (Two 
graphs  are  said  to  be  of  equal  size  if  the  difference  in  the  numbers  of  nodes  is  no  more  than  one.) 
Thompson  showed  that,  up  to  a  constant  factor,  the  layout  area  can  be  no  less  than  the  square 
of  the  bisection  width.  Therefore,  if  the  bisection  width  for  a  graph  is  known,  a  lower  bound 


4 


on  area  can  be  easily  computed.  By  showing  that  certain  computationally  powerful  graphs  such 
as  the  shuffle-exchange  graph  have  large  bisection  width,  Thompson  showed  that  these  graphs 
require  large  area.  In  fact,  Thompson  extended  this  observation  to  obtain  area-time  tradeoffs  for 
computing  certain  functions. 

Leighton  [19,  20]  identified  crossing  number  as  another  general  property  that  afreets  layout 
area.  The  crossing  number  of  a  graph  is  defined  as  the  minimum  number  of  edge  crossings  in 
any  drawing  of  the  graph  in  the  plane.  It  is  easy  to  see  that  the  crossing  number  of  a  graph  is 
a  lower  bound  on  layout  area.  Using  more  sophisticated  arguments  for  special  graphs,  Leighton 
also  directly  obtained  lower  bounds  on  total  wire  length  (the  sum  of  the  lengths  of  the  wires  in  a 
layout),  which  of  course  is  a  lower  bound  on  layout  area.  These  techniques  are  heavily  dependent 
on  the  recursive  structure  of  the  special  graphs  and  will  be  generalized  in  Section  8. 

2.3.  Layouts  Based  on  Separator  Theorems 

Leiserson  [26,  27]  and  Valiant  [45]  investigated  general  properties  that  provide  effective  upper 
bounds  on  layout  area.  They  independently  developed  a  divide- and-conquer  strategy  for  graph 
layout  and  showed,  for  example,  that  every  7V-node  tree  can  be  laid  out  in  O(N)  area  and  that 
every  A’-node  planar  graph  can  be  laid  out  in  0{N  log2  N)  area.  Their  technique  is  based  on  the 
notion  of  separator  theorems  for  graphs. 

Definition:  A  class  of  graphs  which  is  closed  under  the  subgraph  relation  is  said  to  have 
an  f[x)-separator  theorem  if  there  exist  constants  a  and  b  where  0  <  a  <  1/2  and  6  >  0 
such  that  every  /V-node  graph  in  the  class  can  be  partitioned  (by  the  removal  of  at  most 
bf(N)  edges  of  the  graph)  into  disjoint  subgraphs  having  a'N  and  (1  —  a')N  nodes  where 
a  <  a'  <  1  —  a. 


Given  a  class  of  graphs  for  which  a  separator  theorem  is  known  (e.g.,  trees  have  a  1-separator 
theorem  [28]  and  planar  graphs  have  a  >/i-separator  theorem  [29]),  it  is  possible  to  construct  a 
layout  for  any  TV-node  graph  in  the  class  by  using  a  simple  divide-and-conquer  approach.  For 
example,  Leiserson  (26,  27]  proved  the  following  upper  bounds  on  layout  area. 

xa -separator  theorem  Layout  Area 

a  <  1/2  0{N) 

a  =  l/2  0(N  log3  N) 

a  >  1/2  0(N2“) 

Remark.  The  complete  recursive  decomposition  of  a  graph  must  be  provided  as  input  before 
layouts  achieving  the  desired  area  bounds  can  be  constructed  by  the  procedure.  There  is  no 
polynomial  time  algorithm  known  that  achieves  the  area  bounds  if  the  decomposition  is  not 
provided.  This  severely  limits  the  applicability  of  separator- based  layout  strategies  to  classes  of 
graphs  (such  as  trees  or  planar  graphs)  for  which  actual  decompositions  are  known. 

How  good  are  the  preceding  area  bounds?  Thompson  [41,  42]  and  Leighton  [19,  20]  showed 
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that  none  of  the  bounds  can  be  improved.  More  precisely,  they  showed  that  within  each  class 
there  is  a  graph  for  which  the  bound  is  optimal.  But  this  does  not  mean  that  the  bounds  are 
optimal  for  every  graph  within  a  class.  In  fact,  while  the  bounds  are  existentially  optimal,  they 
are  not  universally  optimal.  For  example,  an  //-node  square  grid  requires  area  N,  but  since  the 
minimum  separator  theorem  for  the  class  of  square  grids  is  i/x,  the  best  bound  obtainable  by 
separator-based  layouts  is  0(N  log2  N),  which  is  ofF  by  a  factor  of  0(log2  N)  from  the  optimal. 
Of  course,  since  „V-node  graphs  require  area  at  least  N,  the  bounds  for  graphs  with  z°-separator 
theorems,  where  a  <  1/2,  are  asymptotically  universally  optimal. 

For  graphs  with  larger  separator  theorems,  the  discrepancy  between  the  minimum  layout  area 
and  that  given  in  the  table  can  be  much  worse.  Consider,  for  example,  the  //-node  graph  S n 
which  consists  of  N/logN  disjoint  log //-node  expander  graphs.  We  define  an  m-node  expander 
graph  to  be  a  graph  for  which  any  subset  of  k  nodes  is  linked  by  0(min(k,  m  —  k})  edges  to  the 
m  —  k  nodes  outside  the  subset.  The  bisection  width  of  such  a  graph  is  n(m),  and  hence  the 
minimum  separator  theorem  is  0(x).  The  existence  of  trivalent  graphs  that  satisfy  this  defintion 
has  been  known  for  a  long  time  [12,  15,  44].  In  fact,  almost  all  trivalent  graphs  satisfy  this 
definition.  We  caution  the  reader  that  the  term  "expander  graph”  has  two  definitions  in  the 
literature.  The  other  definition  is  sufficient  for  our  purposes  and  probably  more  standard  but 
requires  graphs  with  higher  node  degrees.  Since  each  log  //-node  expander  graph  can  be  trivially 
laid  out  in  0(log2  N)  area,  the  layout  area  of  is  no  greater  than  <2(A'logjV).  However, 
Leighton  [21]  showed  that  the  minimum  separator  theorem  for  the  class  of  graphs  5 h  exceeds 
Q(xJ  log2 1),  so  that  the  area  bound  from  the  table  above  is  0{N2 /  log4  N),  which  is  much  worse 
than  the  optimal  bound  of  0(N\ogN). 

Remark.  The  careful  reader  will  notice  (as  did  the  referee)  that  any  class  of  graphs  closed 
under  the  subgraph  relation  and  containing  S/v,  must  also  contain  expander  graphs.  Hence,  the 
minimum  separator  for  the  class  is  0(x).  In  order  to  get  around  such  technicalities  with  the 
definition,  the  concept  of  a  separator  is  often  just  applied  to  a  single  graph  and  the  subgraphs 
produced  by  its  recursive  decomposition.  Using  the  less  restrictive  (but  more  useful)  definition, 
it  is  possible  to  show  that  Sn  has  an  0(N/\og  //)-separator.  The  log //-node  expander  graphs 
are  split  in  the  upper  levels  of  the  decomposition  and  never  appear  intact  as  subgraphs  in  the 
lower  levels  of  the  decomposition.  Leighton  proved  that  even  using  the  most  liberal  definition, 
the  minimum  separator  for  Ss  is  at  least  fl(///  log2  AT).  Any  bound  on  layout  area  for  Sn  based 
on  the  minimum  separator  can  be  no  less  than  fl(//2/log4  N). 

Thus,  while  the  divide-and-conquer  strategy  based  on  separator  theorems  gives  existentially 
optimal  bounds,  the  bounds  can  be  unacceptably  poor  in  a  universal  sense.  It  was  the  discovery 
of  such  large  discrepancies  that  led  to  the  search  for  an  alternative  framework  for  VLSI  layout. 
Within  the  new  framework  presented  in  Section  5  we  shall  see  how  these  large  discrepancies  are 
overcome. 

3.  Eight  VLSI  Graph  Layout  Problems 

As  mentioned  earlier,  there  are  many  important  considerations  in  choosing  one  layout  over  a 
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multitude  of  other  possible  layouts.  The  problems  in  this  section  are  motivated  by  some  of  the 
basic  engineering  concerns.  Although  this  list  is  not  meant  to  be  exhaustive,  it  covers  most  of 
the  theoretical  issues  studied  recently.  Many  of  the  problems  are  known  to  be  NP-Complete,  so 
the  solutions  we  later  obtain  will,  of  course,  not  be  optimal.  Rather,  the  major  emphasis  of  this 
paper  is  the  development  of  a  general  framework  for  handling  layout  problems  efficiently  and  in 
a  uniform  manner.  Within  the  framework,  solutions  to  some  problems  are  close  to  optimal.  For 
other  problems,  good  heuristics  are  developed  and/or  general  bounds  are  obtained. 

Problem  1.  Given  a  graph  G,  produce  an  area-efficient  layout  for  G. 

As  mentioned  before,  minimizing  area  is  a  critical  concern  in  VLSI  circuit  layout.  In  addition 
to  the  work  on  area-efficient  layouts  described  in  the  previous  section,  Dolev,  Leighton,  and 
Trickcy  [9]  have  shown  that  determining  the  minimum  layout  area  of  a  forest  of  trees  is  NP- 
Complete. 

Problem  2.  Given  a  graph  G,  produce  an  area-efficient  layout  for  G  with  minimax  edge  length. 

Besides  area,  speed  is  another  critical  factor  in  chip  performance.  Signals  do  not  propagate 
instantaneously  across  wires,  and  the  longer  the  wire,  the  longer  the  propagation  delay.  In 
pipelined  or  systolic  systems,  the  effect  of  propagation  delays  is  even  more  dramatic.  The 
maximum  delay  determines  the  clockperiod,  and  hence  the  throughput,  of  the  system.  To 
maximize  throughput  we  need  to  minimize  the  maximum  delay.  In  short,  we  must  produce 
layouts  so  that  the  longest  edge  is  as  short  as  possible.  The  minimum,  over  all  layouts,  of  the 
length  of  the  longest  edge  is  called  the  minimax  edge  length. 

Paterson,  Ruzzo  and  Snyder  [34)  studied  the  problem  of  minimizing  edge  lengths  for  complete 
binary  trees.  They  showed  that  the  minimax  edge  length  of  an  IV-node  complete  binary  tree  is 
0(V7v /  Ig  AT).  Adopting  a  different  strategy  based  on  separator  theorems,  Bhatt  and  Leiserson 
[3]  subsequently  extended  the  upper  bound  portion  of  the  result  to  arbitrary  trees,  and  to  all 
graphs  with  small  (i.e.,  z",  a  <  1/2)  separator  theorems.  Bhatt  and  Cosmadakis  [2]  showed  that 
computing  the  minimax  edge  length  of  a  tree  is  NP-Complete. 

Problem  3.  Given  a  graph,  produce  an  area-efficient  layout  in  which  each  wire  has  bounded 
delay  in  the  capacitive  model. 

Although  it  is  certainly  true  that  propagation  delay  across  a  wire  depends  on  the  length 
of  the  wire,  there  has  been  little  consensus  on  how  fast  propagation  delay  grows  as  a  function 
of  wire  length.  Thompson  [41,  42]  assumes  propagation  delay  to  be  constant,  independent  of 
wire  length.  This  might  seem  unreasonable  given  the  ultimate  speed-of-light  limitation  which 
indicates  that  the  delay  increases  linearly  with  length.  The  spped-of-light  limitation,  however, 
greatly  exaggerates  the  importance  of  wire  delay  in  determining  the  speed  of  circuits.  Mead  and 
Conway  [31]  take  into  account  some  of  the  electrical  characteristics  of  interconnections  on  MOS 
integrated  circuits,  and  emphasize  the  Tole  of  wire  capacitance  in  determining  propagation  delay. 
Recent  analysis  by  Bilardi,  Pracchi,  and  Preparata  [5]  strongly  supports  the  belief  that  capacitive 
effects  play  the  predominant  role  in  determining  the  speed  of  MOS  circuits. 
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In  a  capacitive  model,  each  wire  is  assumed  to  present  a  purely  capacitive  load  to  the  transistor 
that  drives  a  signal  across  thewire.  This  load  is  proportional  to  the  length  of  the  wire  plus  the 
area  of  the  transistor  that  receives  the  signal.  The  delay  is  proportional  to  this  load  divided  by 
the  area  of  the  driving  transistor.  By  increasing  the  size  of  the  driving  transistor  it  is  therefore 
possible  to  bound  the  propagation  delay,  independent  of  the  length  of  the  wire.  A  second  well- 
known  technique  for  reducing  delay  across  a  long  wire  is  to  “ramp”  the  wire  with  a  geometrically 
increasing  series  of  inverters  [31].  The  number  of  intermediate  drivers,  and  hence  the  delay,  is 
logarithmic  in  the  length  of  the  wire,  but  an  attractive  feature  is  that  this  process  can  be  carried 
out  without  the  need  to  resize  the  original  transistors  in  the  circuit. 

Of  course,  increasing  the  size  of  one  transistor  or  introducing  new  transistors  might  force  some 
wires  to  be  stretched  to  avoid  the  enlarged  transistor  area.  In  other  words,  decreasing  the  delay 
across  one  wire  might  force  an  increase  in  delay  over  other  wires.  Leiscrson  [24]  and  Mehlhorn  [32] 
independently  posed  the  question  of  whether  or  not  the  transistors  in  a  layout  could  be  resized  so 
that  every  wire  in  the  layout  has  constant  propagation  delay.  Ramachandran  [36]  investigated 
the  problem  of  introducing  intermediate  drivers  along  long  wires  to  decrease  delays,  but  under 
the  constraint  that  the  topology  of  the  layout  remain  unchanged.  With  the  restriction  that  wires 
can  not  be  rerouted,  she  showed  that  logarithmic  delay  can  be  achieved,  but  at  the  expense  of 
squaring  the  layout  area  in  the  worst  case.  We  allow  the  layout  topology  to  be  changed,  and 
obtain  significantly  better  results. 

Problem  4.  Given  a  graph  G,  produce  a  layout  for  G  with  few  wire  crossings. 

An  undesirable  feature  of  layouts  is  the  presence  of  a  large  number  of  wire  crossings.  When 
two  wires  cross,  they  must  be  on  different  layers.  For  faster  operation,  and  less  power  dissipation, 
it  is  advantageous  to  maximize  the  total  amount  of  wiring  on  a  layer  of  low  resistance,  e.g.  the 
metal  layer,  while  minimizing  the  wiring  on  a  layer  of  high  resistance,  e.g.  the  polysilicon  layer. 
The  net  wiring  on  one  layer  may  be  reduced  by  laying  wires  on  that  layer  only  just  before  and 
after  two  wires  cross.  If  the  number  of  wire  crossings  is  small,  the  number  of  contact-cuts  which 
connect  wire  segments  on  different  layers  is  small  so  that  the  area  of  the  layout  is  not  blown  up 
by  the  contact  cuts  which  occupy  large  area.  In  addition,  long  wires  that  are  crossed  by  many 
other  wires  are  susceptible  to  cross-talk  when  all  the  crossing  wires  simultaneously  carry  the  same 
signal. 

The  crossing  number  of  a  graph  is  defined  to  be  the  minimum  number  of  wire  crossings  in  any 
drawing  of  the  graph  on  the  plane.  Leighton  [19,  20]  proved  upper  and  lower  bounds  on  crossing 
numbers  and  then  used  the  results  to  find  bounds  on  layout  area.  Garey  and  Johnson  [14]  showed 
that  determining  the  crossing  number  of  bipartite  graphs  is  NP-Complete. 

Problem  5.  Given  a  graph,  produce  an  area-efficient  regular  layout  for  the  graph. 

Some  design  methodologies,  most  notably  gate-arrays,  require  that  processors  be  located  at 
fixed  positions  on  a  chip.  In  gate-arrays  the  processors  are  placed  in  a  grid  pattern  with  uniform 
spacing  between  processors  adjacent  along  every  row  and  column.  Such  layouts  are  said  to  be 
regular.  An  important  advantage  of  this  design  restriction  is  its  flexibility:  even  if  the  size  of 
every  processor  is  increased,  the  wiring  between  processors  remains  unaffected  and  the  total 
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area  remains  proportional  to  the  sum  of  the  wire  area  (as  computed  with  unit-size  processors) 
and  the  processor  area.  This  is  because  only  the  VN  rows  and  columns  containing  the  N  unit- 
size  processors  need  to  be  expanded  to  accomodate  the  non-unit-size  processors.  In  non-regular 
layouts,  every  row  and  column  might  have  to  be  expanded  since  there  might  be  a  node  in  every 
row  and  in  every  column.  Increasing  the  linear  dimension  of  the  processors  by  a  factor  of  s  could 
result  in  an  0(s2)  increase  in  layout  area. 

Previous  divide-and-conquer  layout  strategies  do  not  produce  regular  layouts.  Hence,  they 
are  not  useful  in  laying  out  circuits  with  non-unit-size  processors.  A  good  strategy  for  producing 
regular  layouts  would  solve  the  nagging  problem  of  how  to  cope  with  variable-size  processors. 

Problem  6.  Design  area-efficient  chips  that  can  be  configured  to  realize  a  large  number  of  graphs. 

Because  it  is  expensive  to  make  one  chip  but  cheap  to  make  many  copies,  manufacturers  of 
custom  chips  have  been  encouraged  to  make  configurable  designs  such  as  gate-arrays,  ROM’s  and 
PLA’s.  In  such  designs,  the  entire  chip  is  prefabricated  except  for  one  layer.  The  customer  then 
specifies  a  configuration  for  the  chip,  and  the  final  layer  of  metalization  connects  up  the  circuitry 
in  that  particular  way.  Hence,  most  of  the  design  and  fabrication  costs  can  be  factored  over 
many  custom  chips.  Similarly,  the  fast  emerging  laser-restructuring  technology  [35]  provides 
another  economical  way  to  customize  chips  after  fabrication  is  complete.  Laser  restructuring 
allows  connections  between  wires  to  be  made  or  broken  after  the  chip  has  been  fabricated.  In 
either  case,  it  is  desirable  to  design  layouts  that  can  be  configured  from  one  of  a  few  basic  patterns. 

Problem  7.  On  a  wafer  which  has  arbitrarily  distributed  defective  cells,  realize  a  given  graph 
on  the  good  cells. 

In  any  fabrication  process,  it  is  expected  that  some  of  the  processing  cells  will  be  defective. 
In  a  two-dimensional  array  of  cells  on  a  wafer  in  which  defective  cells  are  arbitrarily  distributed, 
it  may  still  be  possible  to  use  the  wafer  by  configuring  wires  around  the  defective  cells.  This 
may,  for  example,  be  performed  by  laser  restructuring  techniques  [35].  Given  this  ability  to 
isolate  defective  cells,  it  is  important  to  consider  how  a  graph  may  be  realized  on  the  remaining 
good  cells.  This  problem  has  received  considerable  attention  recently  [17,  22,  38].  The  problem 
is  similar  to  the  general  graph  layout  problem  in  the  Thompson  model  but  with  the  important 
restriction  that  nodes  of  the  circuit  can  only  be  mapped  to  a  restricted  set  of  nodes  in  the  grid. 

Problem  8.  Given  a  graph  G,  assemble  G  using  the  minimum  number  of  copies  of  a  single  chip 
having  few  external  pin  connections. 

A  number  of  very  large  networks  have  been  proposed  in  recent  years  for  implementing  priority 
queues  [25],  for  searching  [1],  for  direct  execution  of  applicative  programming  languages  [30],  and 
for  recognizing  regular  expresions  [11].  Some  of  these  networks  are  too  large  to  fit  on  a  single  chip. 
For  example,  the  tree- structured  network  of  [30]  is  envisioned  to  contain  as  many  as  one  million 
processing  elements.  Clearly,  such  networks  must  be  partitioned  over  many  interconnected  chips, 
so  that  each  chip  realizes  a  small  portion  of  the  network. 

The  technology  for  packaging  chips  severely  limits  the  number  of  external  pin  connections 
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on  a  chip  VShile  chips  with  over  a  million  components  are  forseeabie  in  the  near  future,  no  one 
predicts  a  chip  with  over  two  hundred  external  pin  connections.  This  poses  a  pressing  problem 
in  assembling  large  networks  of  processors. 

Even  if  a  network  could  be  partitioned  so  that  each  portion  has  only  a  few  external  connec¬ 
tions,  it  would  be  economically  infeasible  to  design  each  chip  individually.  For  instance,  it  would 
be  prohibitively  expensive  to  design  one  thousand  different  chips,  each  containing  a  thousand 
processing  elements,  to  assemble  a  network  of  one  million  processors.  For  this  reason,  it  is  neces¬ 
sary  to  assemble  large  systems  using  copies  of  a  few  configurable  or  restructurable  chips.  One 
solution  to  the  problem  of  assembling  large  tree  structures  using  copies  of  a  single,  area-efficient, 
restructurable  chip  with  few  external  pin  connections  was  given  by  Bhatt  and  Leiserson  [4]. 

Within  the  new  framework,  efficient  solutions  are  provided  for  each  of  these  problems.  In  fact, 
a  single  layout  simultaneously  solves  many  of  these  problems  efficiently.  The  framework  provides 
a  two-step  strategy  for  solving  these  problems.  First,  the  graph  to  be  laid  out  is  embedded  within 
a  very  special  network  called  the  tree  of  meshes.  For  the  tree  of  meshes  it  is  possible  to  solve  all 
these  problems  efficiently.  In  the  second  step  therefore,  a  good  layout  for  the  tree  of  meshes  also 
solves  these  problems  for  the  embedded  graph. 

4.  Combinatorial  Lemmas 

This  section  contains  three  combinatorial  lemmas  which  provide  the  foundation  for  the  framework 
presented  in  the  next  section. 

Lemma  1.  Consider  any  two-ended  string  of  n  colored  pearls  of  k  different  colors,  and  let  n, 
be  the  number  of  pearls  which  are  color  i  for  1  <  f  <  fc.  For  any  integer  r  >  2  the  pearls  can 
be  partitioned  into  two  sets  by  cutting  the  string  in  no  more  than  9 rk  places  such  that  the  total 
number  of  pearls  in  each  set  is  [n/2J  or  [n/2],  the  number  of  pearls  of  color  1  in  each  set  is  [nj/2j 
or  [ni/2],  and  such  that  the  number  of  pearls  of  color  i  >  1  in  each  set  lies  between  f(J  —  ^)n,] 
and[(J  +  £)n,J. 

Proof.  Let  i  be  a  number  between  1  and  k  and  let  T(i)  denote  the  number  of  cuts  necessary 
to  divide  the  set  of  all  pearls  into  two  sets  that  satisfy  the  constraints  of  the  theorem  for  colors 
1,2, Other  than  requiring  that  the  total  number  of  pearls  be  split  in  half  by  the  cuts,  we 
have  made  no  constraints  on  the  distribution  of  pearls  with  colors  greater  than  i.  We  wish  to 
find  a  good  bound  on  T(i)  in  the  worst  case,  i.e.,  over  all  choices  of  n,  k  >  i,  and  all  possible 
colorings.  In  what  follows,  we  will  show  that  T(l)  —  2  and  that 

T(i)  <  rT(i  -  1)  -f  4r  -f  7 

for  x  >  1.  As  a  consequence,  we  can  solve  the  recurrence  to  conclude  that  T(i)  <  9r’  —  15  for 
r  >  2.  Thus  for  x  =  k,  at  most  9rfc  cuts  are  required,  as  claimed. 

For  x  =  1,  it  is  easy  to  show  that  two  cuts  are  sufficient.  Consider  a  “window”  of  size  [n/2J 
positioned  at  the  left  end  of  the  string.  Without  loss  of  generality,  assume  that  the  window 
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covers  less  than  [n,/2J  of  the  pearls  colored  1.  Move  the  window  to  the  right,  one  pearl  at  a 
time  until  the  window  covers  [nj/2J  pearls  of  color  1.  Since  the  right  half  of  the  string  contains 
more  than  one-half  of  all  pearls  of  color  1,  there  must,  by  continuity,  exist  a  placement  when  the 
window  covers  exactly  one-half  of  all  pearls  of  color  1.  By  cutting  the  string  at  the  endpoints  of 
the  window,  the  portion  of  the  string  under  the  window  will  contain  half  of  the  total  number  of 
pearls  and  half  of  the  pearls  colored  1.  Hence  T[  1)  =  2,  as  claimed. 

For  a  given  i  >  1,  break  the  string  into  r  segments  S,,  1  <  j  <  r,  (making  r  —  1  cuts)  so 
that  each  segment  contains  at  least  [n,/rj  pearls  of  color  t.  Next  split  each  S,  into  two  subsets 
Sj0  and  S;i  (making  a  total  of  rT{ :  —  1)  cuts)  so  that  each  split  satisfies  the  theorem  locally  for 
colors  1,  2 —  1. 

Without  loss  of  generality,  assume  that  S3 o  contains  no  fewer  pearls  of  color  t  than  S} j.  At 
this  stage,  we  divide  the  set  C  of  all  pearls  into  two  subsets  C]  and  Ci  as  follows.  Initially,  let 
Cj  =  ysj0.  If  Ci  contains  more  than  [(j  -f-  jV)n,J  pearls  of  color  i,  remove  5]0  from  Cj  and 
add  Su.  Repeat  this  procedure,  successively  switching  S20  with  S21,  S 30  with  531,  and  so  on 
until  the  first  time  Ct  has  at  most  [(^  +  ^)n,J  pearls  of  color  z.  Such  a  stage  must  occur  since 
the  number  of  pearls  of  color  z  in  Ci  will  eventually  fall  below  fn,/2]  if  Cj  and  C2  are  completely 
interchanged.  The  number  of  pearls  of  color  1  in  Ci  after  the  final  switch  cannot  be  less  than 
[(I  —  j7)n,l  —  2  since  every  5;  contains  no  more  than  \nt/r]  pearls  of  color  z.  If  the  number 
of  pearls  of  color  z  in  Ci  is  [(}  —  ^)n,]  —  1  or  f ( 2  —  2v)n*l  —  2,  then  move  either  one  or  two 
pearls  of  color  i  from  C2  to  Ci,  making  no  more  than  four  cuts. 

We  also  have  to  ensure  that  the  total  set  of  pearls  and  the  pearls  of  the  first  i  —  1  colors  are 
divided  as  required.  The  pearls  with  colors  between  2  and  z  —  1  are  divided  correctly  because 
they  were  divided  correctly  at  the  recursive  step.  The  counts  of  pearls  of  color  1  in  C\  and  Cj 
may  differ  in  size  by  r,  however.  To  balance  the  number  of  pearls  with  color  1  in  each  set,  we 
need  only  remove  up  to  [r /2 J  pearls  colored  1  from  the  excess  set  (making  at  most  r  cuts)  and 
put  them  in  the  deficient  set.  To  balance  the  difference  in  the  overall  sizes  of  the  sets  (which 
now  might  be  as  large  as  2r  4),  we  need  only  extract  up  to  r  -f-  2  pearls  from  the  larger  set 
(making  no  more  than  2r  4  cuts)  and  put  them  in  the  smaller  set.  Of  course,  these  pearls  must 
be  chosen  carefully  so  that  each  set  retains  the  required  minimum  number  of  pearls  of  each  color. 
Since  pearls  are  extracted  only  from  the  larger  set,  it  is  clear  that  this  requirement  may  be  easily 
satisfied. 

The  total  number  of  cuts  made  by  the  procedure  is  rT(i  —  1)  +  4r  +  7,  as  claimed.  ■ 

Using  an  elegant  topological  argument,  Goldberg  and  West  [16]  recently  proved  that  k  cuts 
suffice  to  divide  the  pearls  of  each  color  exactly  in  half.  In  contrast  to  Lemma  1,  this  is  a  dramatic 
reduction  in  the  number  of  cuts.  We  state  their  result  in  Lemma  2,  although  we  cannot  include 
the  proof  here.  We  will  use  the  stronger  result  in  the  paper  since  it  facilitates  the  proofs  and 
results  in  far  smaller  constants.  It  is  very  important  to  note,  however,  that  all  of  our  layout  results 
may  be  proved  with  the  weaker  Lemma  1.  (In  fact,  we  have  done  so  using  r  —  3,  but  will  not 
go  through  the  details  in  this  paper.)  Since  the  Goldberg- West  result  has  not  yet  appeared,  we 
have  included  Lemma  1  both  for  completeness  and  so  that  our  results  will  not  depend  on  as-yet 
unpublished  work.  Both  results  are  implementable  in  polynomial  time  when  the  number  of  colors 
is  fixed,  as  is  the  case  throughout  this  paper. 


Lemma  2.  Consider  any  two-ended  string  of  n  pearls,  n,  of  which  are  colored  i,  1  <  i  <  k. 
By  catting  the  string  in  k  places  it  is  possible  to  divide  the  pearls  into  two  sets  so  that  each  set  has 
a  total  of[n/ 2J  or  fn/2|  pearls,  and  [n,/2J  or  [n,/2]  pearls  of  color  i  for  all  i,  1  <  t  <  fc. 

The  following  lemma  recasts  Lemma  2  in  terms  of  complete  binary  trees.  This  form  is 
particularly  useful  since  the  recursive  decomposition  of  a  graph  may  be  viewed  as  a  tree.  In  the 
following  we  define  the  height  of  a  tree  to  be  the  length  of  the  longest  path  from  the  root  to  a 
leaf.  The  height  of  a  forest  is  defined  to  be  the  maximum  height  of  a  tree  in  the  forest.  Finally, 
the  level  of  a  node  in  the  forest  is  defined  to  be  the  height  of  the  forest  minus  the  length  of  the 
longest  path  from  the  node  to  a  leaf.  (Note  that  the  top  level  is  level  zero.) 

Lemma  3.  Consider  a  forest  of  complete  binary  trees  whose  n  leaves  are  colored  arbitrarily  with 
k  colors.  Let  n,  be  the  numoer  of  leaves  colored  i  for  1  <  j  <  k.  By  removing  no  more  than  k 
nodes  (as  well  as  all  incident  edges )  from  each  internal  level  of  the  forest,  it  is  possible  to  produce 
a  new  forest  of  complete  binary  trees,  some  subset  of  which  contains  [n/2J  or  [n/2]  leaves,  and 
[n,/2J  or  fn,/2]  nodes  of  color  t  for  each  t,  1  <  i  <  k. 

Proof.  Draw  the  trees  in  the  canonical  manner  and  place  them  side-by-side,  in  any  order, 
so  that  the  leaves  of  all  trees  are  placed  along  a  line.  By  applying  Lemma  2  to  the  induced 
left-to- right  ordering  on  the  leaves  of  the  forest,  it  is  possible  to  break  the  ordering  in  no  more 
than  k  places  such  that  the  union  of  the  leaves  contained  in  every  other  segment  contains  the 
desired  total  number  of  leaves  and  the  desired  number  of  leaves  of  each  color. 

For  each  break,  remove  the  nodes  (and  incident  edges)  which  are  simultaneously  ancestors  of 
the  leaf  immediately  to  the  left  of  the  break  and  the  leaf  immediately  to  the  right  of  the  break. 
It  is  easily  seen  that  at  most  one  node  is  removed  from  each  internal  level  of  the  forest  for  each 
break.  Therefore,  no  more  than  k  total  nodes  are  removed  from  each  internal  level.  In  addition, 
the  removal  of  the  common  ancestors  of  the  leaves  neighboring  a  break  divides  the  associated 
tree  into  two  or  more  complete  binary  trees,  at  least  one  on  each  side  of  the  break.  Thus  the 
removal  of  all  such  nodes  produces  a  forest  of  complete  binary  trees,  subsets  of  which  correspond 
precisely  to  the  sets  of  leaves  between  pairs  of  adjacent  break  points.  Thus  the  union  of  the 
subsets  of  trees  corresponding  to  every  other  segment  of  leaves  contains  the  desired  number  of 
leaves  of  each  color.  I 

Figure  3  illustrates  the  proof  of  the  preceeding  lemma  with  a  simple  example.  Initially,  the 
forest  consists  of  four  complete  binary  trees  with  seven  leaves  colored  1,  four  colored  2,  and  four 
colored  3.  Figure  3a  shows  a  leveled  drawing  of  the  forest  along  with  three  breaks  (denoted  by 
dashed  vertical  lines)  in  the  line  of  leaves.  The  union  of  leaves  in  the  first  and  third  intervals 
contains  three  leaves  colored  1,  two  of  color  2,  and  two  of  color  3.  In  Figure  3b  the  internal  nodes 
to  be  removed  are  marked  X.  Figure  3c  shows  the  new  forest  produced  by  the  removal  of  the 
marked  internal  nodes. 
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Figure  3.  An  illustration  of  the  procedure  described  in  Lemma  S. 


5.  The  New  Framework 

In  this  section,  we  describe  the  new  framework  for  solving  VLSI  graph  layout  problems. 
We  start  by  defining  the  notions  of  decomposition  trees  and  bifurcators  for  graphs.  Using  the 
combinatorial  lemmas  from  Section  4,  we  devise  procedures  for  balancing  decomposition  trees 
and  bifurcators.  In  Section  5.3,  balanced  decomposition  trees  are  used  to  embed  graphs  within 
the  tree  of  meshes.  Section  5.4  provides  efficient  layouts  for  the  tree  of  meshes.  Taken  together, 
the  embedding  of  a  graph  in  the  tree  of  meshes  and  the  layout  for  the  tree  of  meshes  induce  a 
layout  for  the  original  graph. 

5.1.  Decomposition  Trees  and  Bifurcators 

The  recursive  decomposition  of  a  graph  into  smaller  and  smaller  subgraphs  may  be  viewed  as 
a  decomposition  tree.  In  particular,  we  say  that  a  graph  G  has  an  [Fq,F\,  ...  ,FT)- decomposition 
tree  if  G  can  be  decomposed  into  two  subgraphs  Go  and  Gi  by  removing  no  more  than  Fo  edges 
from  G,  and,  in  turn,  both  Gy  and  Gi  can  be  decomposed  into  smaller  subgraphs  by  removing 
no  more  than  Fj  edges  from  each,  and  so  on  until  each  subgraph  is  either  empty  or  an  isolated 
node.  Figure  4  illustrates  this  recursive  decomposition. 

As  one  might  expect,  the  decomposition  of  a  graph  by  separator  theorems  may  be  viewed 
as  a  decomposition  tree.  It  follows  by  definition  that  if  a  class  of  graphs  has  an  /(x)-separator 
theorem,  then  there  are  constants  a  and  0  such  that  each  graph  in  the  class  has  a  decomposition 
tree  of  the  form  (0f(N),Pf[aN)t0f{a2N),...,0f(l)).  The  converse  is  not  necessarily  true. 
Subgraphs  generated  at  each  step  of  a  decomposition  by  a  separator  theorem  are  constrained  to 
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Figure  4.  An  (Fo.Fi, . . .,  F,)- decomposition  tree. 

be  proportional  in  size,  whereas  decomposition  trees  need  not  satisfy  this  constraint.  Of  course, 
if  the  decomposition  tree  has  precisely  log  AT  levels,  then  subgraphs  at  each  level  must  be  equal 
in  size. 

We  shall  be  particularly  interested  in  a  special  class  of  decomposition  trees,  namely  bifurcatora, 
that  is  distinct  from  the  class  of  separators. 

Definition.  An  N-node  graph  has  an  a-bifnrcator  of  size  F  ( more  simply,  an  (F,  a)-bifurcator ) 
if  it  has  an  (F ,  F  /  a,  F  fa2 .,1)- decomposition  tree. 

Of  particular  interest  is  the  class  of  v/2-bifurcators.  By  the  definition,  we  know  that  an  IV-node 
graph  has  a  %/2-bifurcator  of  size  F  if  and  only  if  it  has  an  [F,F/y/2,F/2, . . . ,  ^-decomposition 
tree.  The  depth  of  this  tree  is  no  greater  than  2  log  F.  In  order  to  completely  decompose  an 
N-node  graph  into  individual  nodes,  the  height  of  any  decomposition  tree  cannot  be  less  than 
the  log  IV.  Thus,  F  must  always  be  at  least  y/N.  On  the  other  hand,  F  is  always  less  than  2 N 
since  every  N- node  graph  with  maximum  node  degree  four  has  at  most  2 N  edges. 

If  a  class  of  graphs  has  an  x°-separator  theorem,  where  a  <  1/2,  and  the  corresponding 
decomposition  is  balanced  in  that  every  graph  is  always  decomposed  into  equal-size  subgraphs, 
then  it  is  straightforward  to  show  that  every  IV-node  graph  in  the  class  has  a  \/2-bifurcator  of 
size  0[VN).  Similarly,  if  a  class  of  graphs  has  a  balanced  separator  theorem  of  size  xa  with 
q  >  1/2,  then  every  IV-node  graph  in  the  class  has  a  \/2-bifurcator  of  size  0(IVo). 

The  converse  is  not  true  even  if  we  consider  only  bifurcators  whose  corresponding  decomposi¬ 
tion  trees  are  balanced  so  that  every  graph  is  decomposed  into  equal-size  subgraphs.  For  example, 
the  IV-node  graph  5/v  defined  in  Section  2.3  has  a  balanced  \/2-bifurcator  of  size  0(\/N  log  N) 
but  the  smallest  separator  for  this  class  of  graphs  is  n(x/logJ  x). 
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When  translated  into  bounds  on  layout  area,  this  seemingly  minor  difference  between  bifur- 
cators  and  separators  is  greatly  magnified.  Graphs  with  small  layout  area  always  have  small  \/2- 
bifurcctors,  but  do  not  always  have  small  separators.  This  is  formalized  in  the  following  lemma. 
Later  on  we  will  prove  the  converse:  graphs  with  small  v^2- bifurcators  always  have  small  layout 
area. 

Lemma  4.  If  a  graph  G  can  be  laid  out  in  area  A,  then  G  has  a  ('/A,  \/2)- bifurcator. 

Proof.  Consider  a  vertical  cut  of  length  \/A  through  the  center  of  the  layout.  Next,  cut 
each  of  the  sublayouts  horizontally  through  the  center.  Continuing  this  sequence  of  alternating 
vertical  and  horizontal  cuts,  it  is  easy  to  see  that  at  the  zth  step  no  more  than  \Za/2‘-,/2-  edges 
are  cut  from  each  subgraph.  This  sequence  of  cuts  yields  a  {VA,  v^J-bifurcator  for  G.  I 

5.1.1  Special  Cases 

Many  graphs  have  decomposition  trees  in  which  the  number  of  cuts  decreases  very  slowly  as 
we  go  lower  down  the  tree.  In  such  cases  the  number  of  cuts  at  higher  levels  of  the  tree  may  be 
very  small.  On  the  other  hand,  in  decomposition  trees  corresponding  to  bifurcators,  the  number 
of  cuts  permitted  decreases  smoothly  as  we  go  down  the  tree.  It  is  conceivable  then,  that  the 
bifurcator  permits  far  more  cuts  at  higher  levels  than  are  necessary.  For  example,  A’-node  binary 
trees  have  decomposition  trees  of  height  0(log  A')  in  which  no  more  than  1  cut  is  required  at 
every  level.  Since  the  minimum  bifurcator  is  at  least  s/N,  the  decomposition  tree  corresponding 
to  the  bifurcator  allows  far  more  cuts  at  the  top  levels  than  needed. 

Similarly,  some  graphs  have  decomposition  trees  in  which  many  cuts  are  required  at  the  top 
levels,  but  this  number  decreases  very  quickly  as  we  go  down  the  decomposition  tree.  In  such 
cases,  the  minimum  bifurcator  is  large  so  that  decomposition  trees  corresponding  to  the  bifurcator 
do  not  underestimate  the  number  of  cuts  required  at  the  top  level.  However,  they  do  greatly 
overestimate  tne  number  of  cuts  at  lower  levels. 

It  is  useful  to  separate  such  extreme  cases  from  a  general  discussion.  Of  course,  general  upper 
bounds  are  valid  for  graphs  with  extreme  decompositions,  but  they  may  overestimate  the  true 
bound.  A  particularly  important  reason  for  separating  these  classes  is  that  many  computationally 
useful  graphs  such  as  binary  trees  fall  into  the  first  category  while  cube-connected-cycles  and 
multidimensional  meshes  fall  into  the  second  category. 

An  A'-node  graph  is  defined  to  have  a  type  A  \/2- bifurcator  if  it  has  an  (0(\//V),  \/2)-bifurcator 
such  that  no  more  than  0((N / 2')°)  cuts,  a  <  1/2,  are  required  for  each  partition  at  the  zth  level 
of  the  associated  decomposition  tree.  Observe  that  at  the  higher  levels  of  the  tree,  t  <  <  log  N, 
the  number  of  cuts  is  far  less  than  the  0(>/N / 2lf2)  cuts  allowed  by  the  usual  bifurcator. 

Similarly,  an  fV-node  graph  is  defined  to  have  a  type  B  \/2 -bifurcator  if  it  has  an  (0[Na),  %/2)- 
bifurcator,  a  >  1/2,  such  that  only  0((N/2')a)  edges  are  cut  in  any  partition  at  the  ith  level. 
Observe  that  for  the  lower  levels  of  the  tree,  ;  >>  1,  this  quantity  is  far  smaller  than  the 
0(Na / 2'/2)  cuts  allowed  by  the  usual  bifurcator. 

For  simplicity,  we  will  prove  results  only  for  general  \/2- bifurcators  in  this  paper.  However, 
whenever  there  is  a  significant  difference,  results  for  the  special  cases  are  stated  separately.  The 
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proofs  for  these  special  cases  are  easily  worked  out,  and  closely  follow  the  proofs  for  the  general 
cases.  We  leave  such  details  to  the  interested  reader. 


5.2  Balanced  Decomposition  Trees 

Of  particular  interest  to  the  layout  results  reported  in  this  paper  are  decomposition  trees 
where  at  each  step  of  the  decomposition,  the  two  subgraphs  are  nearly  equal  in  size.  This  section 
considers  such  balanced  decompositions  and  gives  an  effective  procedure  for  transforming  an 
arbitrary  decomposition  tree  into  one  that  is  balanced. 

Formally,  a  decomposition  tree  for  a  graph  G  is  balanced  if  each  subgraph  Gw  in  the  tree  is 
the  father  of  two  subgraphs  Gwq  and  Gw\  such  that  the  number  of  nodes  in  the  subgraphs  differ 
by  at  most  1.  In  addition,  we  say  that  a  decomposition  tree  is  fully  balanced  if  it  is  balanced,  and 
if  for  every  subgraph  Gw  in  the  tree,  the  set  of  edges  connecting  G  —  Gw  to  Gw  is  divided  into 
two  subsets  of  nearly  equal  size  by  the  partition  of  Gw  into  Gw 0  and  Gw i-  (Here  we  allow  the 
number  of  edge  connections  in  the  two  subgraphs  to  differ  by  a  small  constant,  say  5.  For  the 
purposes  of  simplicity,  however,  we  shall  often  ignore  such  small  differences  and  assume  that  the 
nodes  and  connections  are  split  evenly  between  the  two  subgraphs.) 

Somewhat  surprisingly,  any  decomposition  tree  may  be  transformed  into  a  fully  balanced  one 
at  little  or  no  cost.  We  prove  this  in  the  following  theorem  which  generalizes  earlier  results  in  [4, 
19,  20,  21]. 

Theorem  5.  Let  G  be  any  N-node  graph  with  an  (Fq,  Fi, . . Fr)-decomposition  tree  T.  Then 
G  has  a  fully  balanced  (F'0,  -decomposition  tree,  such  that  for  0  <  t  <  log  N, 
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Proof.  Let  T  be  a  forest  of  complete  binary  trees  consisting  initially  of  the  decomposition  tree 
T.  Color  the  leaves  of  T  with  two  colors  according  to  whether  or  not  the  subgraph  of  G  associated 
with  the  leaf  is  empty.  Apply  Lemma  3  (k  =  2)  to  T,  removing  the  indicated  nodes  and  edges  of 
T.  Each  node  of  T  corresponds  naturally  to  a  set  of  edges  of  G,  namely  the  edges  whose  removal 
splits  the  associated  subgraph  in  two.  Removing  a  node  of  T  corresponds  to  removing  this  cutset 
of  edges  from  G.  Since  no  more  than  2  nodes  are  removed  from  each  level  of  T,  the  number  of 
edges  removed  from  G  in  applying  Lemma  3  does  not  exceed  2)T'_0F,,  which  is  less  than  F'0. 

Further  note  that  G  is  divided  into  two  disjoint  subgraphs  of  nearly-equal-size  by  the  removal 
of  these  edges.  Each  subgraph,  in  turn,  corresponds  in  a  natural  way  to  a  subforest  of  complete 
binary  trees  in  T.  Consider  one  such  subgraph  G0  and  color  the  leaves  of  the  associated  forest  of 
complete  binary  trees  To  using  six  colors  as  follows: 

If  the  leaf  corresponds  to  an  empty  subgraph,  color  the  leaf  with  color  1.  Otherwise,  if  the 
single  node  corresponding  to  the  leaf  is  incident  to  exactly  j  edges  of  G  removed  earlier, 
0  <  j  <  4,  then  color  the  leaf  with  color  j  +  2. 

By  applying  Lemma  3  (k  =  6)  to  To,  it  is  clear  that  Go  can  be  decomposed  into  two  disjoint 


16 


subgraphs  Goo  and  Gqi  of  nearly-equal-size  such  that  the  number  of  edges  from  G  —  Go  to  Goo 
is  nearly-equal  to  the  number  of  edges  from  G  —  Go  to  Goi-  Since  at  most  6  nodes  were  removed 
from  each  level  of  r0  and  since  To  does  not  contain  the  root  of  T,  we  can  conclude  that  no  more 
than  6£'_,.Fj  =  F\  edges  were  removed  from  Go- 

By  applying  the  above  argument  recursively,  the  desired  fully-balanced  decomposition  tree  is 
easily  obtained.  The  only  point  to  observe  is  that  with  each  application  of  Lemma  3,  the  biggest 
tree  in  any  forest  corresponding  to  a  subgraph  decreases  in  height  by  at  least  one.  This  is  because 
the  total  number  of  leaves  in  each  forest  is  cut  in  half  at  each  step.  A  total  of  log  jV  -f-  1  levels 
are  sufficient  for  the  decomposition  since  the  number  of  nodes  in  each  subgraph  is  also  split  in 
half  at  each  step.  | 

Theorem  6.  Every  graph  with  a  \/2-bifurcator  of  size  F  has  a  fully  balanced  y/2-bifurcator  of 
size  6(2  +  \/2 )F. 

Proof.  The  result  follows  immediately  from  the  preceeding  theorem,  with  the  observation 
that  £.>02-,/2  <  2  +  ^2-  I 

Remark.  The  procedure  described  in  Theorems  5  and  6  can  be  implemented  in  polynomial  time. 

5.3  Embeddings  in  the  Tree  of  Meshes 

Leighton  [19,  20]  introduced  the  tree  of  meshes  as  an  example  of  a  planar  graph  that  cannot 
be  laid  out  in  linear  area.  He  also  showed  that  every  N-node  planar  graph  can  be  embedded  in 
an  0(jV  log  N)-node  tree  of  meshes.  In  this  section,  we  define  the  tree  of  meshes  and  describe  a 
general  strategy  for  embedding  a  graph  in  the  tree  of  meshes. 

The  tree  of  meshes  is  formed  by  replacing  each  node  of  a  complete  binary  tree  with  a  mesh 
and  each  edge  by  several  edges  which  connect  meshes  at  consecutive  levels.  More  precisely,  the 
root  of  the  complete  binary  tree  is  replaced  by  an  n  X  n  mesh  (it  is  assumed  that  n  is  a  power 
of  2),  the  nodes  at  the  second  level  are  replaced  by  n  X  n/2  meshes,  those  at  the  third  level  by 


Figure  5.  The  4X4  tree  of  meshes  T «. 


n/2  X  n/2  meshes,  and  so  on  until  the  leaves  of  the  tree  are  replaced  by  1  X  1  meshes.  As  shown 
in  Figure  5,  each  edge  of  the  tree  is  replaced  with  edges  connecting  nodes  on  one  side  of  the 
higher-level  mesh  to  the  top  row  of  the  mesh  at  the  lower  level.  The  resulting  graph  is  called  the 
n  X  n  tree  of  meshes  Tn.  It  is  not  difficult  to  see  that  Tn,  has  N  =  2n2  log  n  -(-  n2  nodes. 

For  some  applications,  we  need  to  consider  only  the  top  levels  of  the  tree  of  meshes.  We 
call  the  subgraph  consisting  of  levels  0,  of  Tn  a  truncated  tree  of  meshes  Tn,p.  Note  that 

p  <  2  log  iV. 

Theorem  7.  There  is  a  constant  c  such  that  every  N-node  graph  G  with  an  ( F,\/2)-bifurcator 
can  be  embedded  m  TcF  2  log  v.  Moreover,  the  embedding  is  regular  in  the  sense  that  F2 / N  nodes 
of  G  are  embedded  in  a  regular  fashion  each  of  the  N2/F2  bottom-level  meshes  ofTcF2]og 

Proof.  We  first  use  Theorem  6  to  construct  a  fully-balanced  >/2-bifurcator  of  size  6(2  +  \/2)F 
for  G.  We  then  use  the  internal  meshes  of  TcF  2\ot  ±  to  route  the  edges  that  were  removed  in 
the  upper  2  log  £  levels  of  the  fully  balanced  decomposition  tree  for  G.  The  subgraphs  in  the 
(2  log  £)ih  level  of  the  decomposition  tree  (each  of  which  has  [F2  fi\f  J  or  [F2/A7]  nodes)  are  then 
embedded  in  the  meshes  on  the  bottom  level  of  the  truncated  tree  of  meshes. 

The  internal  meshes  are  used  in  the  same  manner  that  complete  crossbar  switches  are  used 
in  switching  networks.  For  example,  in  Figure  6  six  wires  enter  the  mesh  through  the  top,  of 
which  four  exit  from  the  left  side  and  two  from  the  right.  In  addition,  four  wires  enter  and  ezrit 
from  the  sides.  No  matter  what  the  ordering  of  the  wires,  they  can  easily  be  routed  through  the 
mesh  as  shown.  In  general,  if  the  number  of  wires  routed  through  a  mesh  does  not  exceed  any 
side-length  of  the  mesh,  a  routing  may  always  be  found.  Similarly,  a  graph  with  M  nodes  can 
always  be  embedded  in  a  4 M  X  4 M  mesh  with  nodes  placed  in  a  regular  fashion. 

Consider  only  the  top  2  log  £  +  1  levels  of  a  fully  balanced  decomposition  tree  for  G.  Each  of 
the  subgraphs  at  level  21og  £  of  the  decomposition  tree  has  7V(1  /2)2  lo®  t  =  F2/N  nodes.  (For 
simplicity  we  shall  assume  that  F2/N  is  an  integer.)  Furthermore,  if  Et  is  the  maximum  number 
of  edges  between  G  —  G,  and  G„  where  G,  is  a  subgraph  in  the  decomposition  tree  at  level  i, 
then  it  is  easy  to  see  that  Eg  =  0  and  by  Theorem  6,  that 


Et  <  -Et—  i  -f-  6(2  -f-  \/2) - 

2  2(‘— M/a 

for  1  <  i  <  2  log  Solving  the  above  recurrence,  we  obtain: 


Et  <  6(2  +  \/2) — - — 
2<— »/J 


Dv^/2)', 

»>o 


and  thus 

Et  <  6(2  +  \/2)J — - — . 


We  now  embed  G  in  rc/ri2|0i^  .  First,  embed  each  of  the  (2 log  ££}-level  subgraphs  of  the 
decomposition  tree  in  the  bottom  level  meshes.  This  can  be  done  if  the  side  of  each  mesh  at  level 
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Figure  6.  Using  a  mesh  in  the  tree  of  meshes  as  a  crossbar  switch. 

2  log  %  exceeds  4 F3/N.  This  is  true  provided 

cF/v^2'01*  >  4 F*/N. 

For  c  >  4,  this  inequality  is  easily  satisfied. 

Next  embed  the  additional  edges  through  the  upper-level  meshes  in  the  natural  way.  No  more 
than  2£,+  t  edges  pass  through  any  »th  level  mesh.  Thus  the  routing  can  be  performed  if  the 
smaller  side  of  the  ith  level  meshes  exceeds  2E,+i.  In  other  words,  we  must  have: 

cF/ 2ri/a1  >  12(2  +  V2)7F/  2i/s. 

A  simple  calculation  shows  that  the  inequality  is  satisfied  for  sufficiently  large  c.  | 

Remark.  Throughout  the  paper,  we  express  bounds  using  the  term  log  For  all  practical 
purposes  F  is  much  smaller  than  N  and  this  term  is  greater  than  one.  Should  the  value  of  F  be 
larger,  however,  we  shall  still  define  log  to  be  at  least  one.  Similar  interpretations  are  assumed 
for  log  log  $  and  for  log  log  log  The  conventions  avoid  the  annoying  (and  trivial)  cases  when 
F  is  very  large  without  complicating  the  analysis  further. 

In  the  preceding  embedding,  all  the  nodes  of  G  were  mapped  to  meshes  at  the  bottom  level 
of  the  truncated  tree  of  meshes.  Thus,  edges  between  nodes  in  different  meshes  might  have  to 
be  routed  through  as  many  as  4  log  meshes.  Such  long  edges  are  undesirable  for  a  variety  of 
reasons.  It  is  natural  to  ask  whether  an  embedding  can  be  found  in  which  each  edge  can  be 
routed  through  fewer  intermediate  meshes.  This  is  answered  in  the  following  theorem. 
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Theorem  8.  There  are  constants  c  and  k  such  that  every  N -node  graph  G  with  an  [F,  \/2)- 
bifurcator  can  be  embedded  in  TcF  %  lof  v  and  such  that  no  edge  is  routed  through  more  than  k 
intermediate  meshes. 


Proof.  We  adopt  a  slight  variant  of  the  strategy  used  in  Theorems  5-7.  The  balancing 
and  embedding  are  done  simultaneously  and  in  the  same  manner  as  before,  except  at  levels  0, 
k,  2k,  3k, . . .  (where  k  is  a  constant  specified  later).  At  these  levels,  we  embed  the  nodes  that  are 
incident  to  edges  previously  cut,  and  we  cut  the  previously  uncut  edges  incident  to  these  nodes. 
Of  course,  this  could  triple  the  number  of  cut  edges  every  k  levels  but  if  k  is  sufficiently  large, 
this  happens  infrequently  and  is  not  harmful.  At  all  other  levels  the  procedure  is  the  same  as 
before,  using  6  colors  and  Lemma  3  to  partition  the  decomposition  tree.  The  process  terminates 
after  2  log  ^  levels. 

As  before,  the  embedding  is  accomplished  by  using  meshes  as  switching  boxes  for  routing 
edges.  We  must  ensure  that  the  number  of  edges  routed  through  any  mesh  does  net  exceed  the 
side  lengths  of  the  mesh.  The  calculation  is  the  same  as  before  except  that  the  number  of  cut 
edges  is  tripled  at  every  fcth  level.  Thus  the  recurrence  for  Ex  is 


Here,  we  have  (without  loss  of  generality)  increased  number  of  cut  edges  by  a  factor  of  3  initially 
and  by  a  factor  of  3l/k  at  each  level  instead  of  increasing  the  number  of  cuts  by  a  factor  of  3  at 
every  fcth  level.  Solving  the  recurrence,  we  find 


Et  <  18(2  +  y/2) 


F 

2{»— 1)/2 


For  k  >  4,  the  sum  converges  to  a  constant.  The  remaining  analysis  is  the  same  as  in  Theorems 
5-7,  except  that  the  constants  are  larger.  I 


Remark.  It  is  worthwhile  to  point  out  here  that  Theorems  7  and  8  could  also  have  been 
proved  using  Lemma  1  as  instead  of  Lemma  2.  The  nodes  of  G  would  still  be  balanced  in  the 
decomposition  tree  but  the  cut  edges  could  only  be  split  1/3  -  2/3  at  each  decomposition.  While 
this  increases  the  value  of  the  sum,  it  still  converges  to  a  constant.  (This  is  because  for  sufficiently 
large  Jfc,  ^31/*  <  1)  Hence,  k  and  c  would  be  larger  but  the  statements  of  the  theorems  remain 
the  same. 


5.4  Layouts  for  the  Tree  of  Meshes 

Thus  far  we  have  considered  only  the  problem  of  embedding  graphs  in  the  tree  of  meshes. 
How  do  we  lay  out  the  tree  of  meshes  efficiently?  Cleariy,  any  layout  for  the  tree  of  meshes  also 
gives  a  layout  for  every  graph  that  can  be  embedded  within  the  tree  of  meshes.  In  this  section 
we  develop  two  different  layouts  for  the  tree  of  meshes. 

The  first  layout  is  a  straightforward  modification  of  the  “H-tree”  layout  for  complete  binary 
trees  [31].  The  modified  layout  is  obtained  by  expanding  each  node  of  the  complete  binary  tree 
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Figure  7.  The  H-layout  of  the  tree  of  meshei. 


into  a  mesh  of  the  appropriate  size.  Figure  7  shows  this  layout.  It  is  easy  to  see  that  if  S(F) 
denotes  the  side  of  the  layout  for  7>,  then  S(l)  =  1,  and 

S(F)  <  2S(F/2)  +  0(F), 

which  gives  S(F)  =  0(F  log  F).  This  means  that  the  area  of  the  layout  for  7>  is  bounded  by 
0(F2  log2  F).  As  shown  in  [19,  20),  this  bound  is  optimal. 

For  truncated  trees  of  meshes,  such  as  considered  in  Theorems  7  and  8,  a  similar  result  holds. 

Theorem  9.  The  truncated  tree  of  meshes  TF2  io(  £  h™  &  layout  of  area  0(F3  log2  £f). 

Proof.  The  obvious  restriction  of  the  H-layout  to  the  top  levels  suffices.  | 

Although  the  mesh  edges  in  the  layout  shown  in  Figure  7  have  length  1,  the  edges  between 
meshes  c;.n  be  quite  long  (nearly  half  the  side  of  the  layout).  By  pulling  in  meshes  closer  towards 
the  top  level,  we  can  reduce  the  length  of  the  longest  edge  considerably.  This  technique  was 
introduced  in  [3]  to  produce  minimax  edge  length  layouts  for  trees,  and  generalized  to  graphs 
with  known  separators.  In  the  following  theorem  we  lay  out  the  truncated  tree  of  meshes  with 
shorter  edges,  using  a  simplified  version  of  the  argument  introduced  in  (3).  This  layout  will  later 
be  used  to  find  layouts  with  short  edges  for  graphs  embedded  within  the  truncated  tree  of  meshes. 
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Figure  8.  An  improved  layout  for  the  tree  of  meshes. 


Theorem  10.  The  truncated  tree  of  meshes  7>i2  iog  ^  can  he  laid  out  in  area  0(F2  log2  £)  so 
that  mesh  edges  have  length  1  and  edges  between  meshes  have  length  at  most  0[F  log  £/  log  log  £f). 

Proof.  Consider  the  H-tree  layout  of  a  complete  binary  tree  of  height  2  log  log  log  £,  and 
having  (log  log  ^f)2  leaves.  Expand  each  linear  dimension  by  a  factor  0  =  ©(Flog  £/ log  log  $), 
so  that  each  edge  of  the  H-tree  layout  becomes  a  channel  of  width  0  and  each  node  becomes  a 
0  X  0  square.  The  resulting  area  is  (0  log  log  $f)2  =  @(F2  log*  $£). 

Since  the  channels  are  much  wider  than  the  side  of  any  mesh,  we  can  stack  many  meshes 
within  one  channel.  In  particular,  as  seen  in  Figure  8,  we  embed  the  top  level  mesh  at  the  center 
of  the  layout  with  the  second-level  meshes  on  either  side.  In  the  first  stage  of  the  layout,  the 
meshes  in  the  top  levels  are  placed  together  in  a  breadth-first  manner.  Meshes  at  successive  levels 
are  equally  spaced  at  distance  ©(Flog  log  log  ££)  apart. 

We  need  to  ensure  that  every  channel  is  wide  enough  to  accomodate  the  meshes  stacked  within 
it.  To  this  end,  let  us  suppose  that  all  meshes  embedded  in  the  first  stage  are  stacked  together  in 
the  same  channel.  Of  course,  this  is  a  gross  overestimate,  but  suffices  for  our  argument.  Since  the 
path  from  the  root  to  a  leaf  in  the  original  (log  log  £)2-leaf  H-layout  has  length  ©(log  log  $),  a 
total  of  clog  log  ^  levels  of  TF  21og  ^  are  embedded  in  the  first  stage.  The  value  of  the  constant 
c  depends  on  the  values  of  the  other  constants  in  the  ©-terms  and  can  be  made  as  small  as 
necessary. 

The  total  number  of  meshes  embedded  in  the  first  stage  is  no  more  than  2,+clo«l0*  £.  Each 
mesh  has  side  length  no  greater  than  F,  so  to  stack  all  these  meshes  within  one  channel  of  side 
0,  it  suffices  to  have: 
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^ji  +  cloglogf  <  of  — 1  °Mt\ 

Uoglog  fj 

which  is  easily  satisfied  when  c  <  1/2.  Hence  every  channel  has  sufficient  width  to  stack  all  the 
ith  level  meshes  across  the  channel  for  any  t  <  clog  log 

In  the  second  stage,  we  embed  the  remaining  meshes  in  the  0X0  squares.  A  total  of 
(log  £)c/(l°gl°g  ^)2  copies  of  an  O(log  level  ^~g  (log^j7^  truncated  tree  of  meshes 

must  be  embedded  in  each  of  the  (log  log  y)2  0  X  0  regions  to  accomplish  this.  Using  the  layout 
described  in  Theorem  9  for  each  copy,  the  total  area  required  ir  ?ach  region  is 


0 


(log  £)c 


(log  log 


)2  (log  Y 


N 


“■•(?)) -(iSS) 


This  is  precisely  the  amount  of  area  available  in  each  0X0  region.  Hence  the  embedding  is 
possible. 


It  remains  to  verify  that  the  edges  between  meshes  have  length  0(F  log  £/  log  log  ^).  This 
is  easily  done  since  meshes  in  adjacent  levels  were  spaced  distance  ©(Flog  £/  log  log  apart  in 
the  first  stage,  and  since  meshes  in  adjacent  levels  were  located  in  the  same  0X0  region  in  the 
second  stage.  I 


6.  Solutions  to  the  Eight  Problems 


Using  the  framework  described  in  the  previous  section,  we  are  now  ready  to  present  general 
solutions  to  the  eight  problems  posed  in  Section  3.  Not  surprisingly,  the  methods  of  the  previous 
section  apply  almost  directly  to  these  diverse  problems.  This  supports  the  belief  that  the  divide- 
and-conquer  strategy  based  on  bifurcators  is  an  efficient  paradigm  for  VLSI  graph  layout,  and 
that  the  tree  of  meshes  is  a  versatile  network  for  solving  layout  problems.  The  solutions  presented 
in  this  section  are  evaluated  by  comparing  them  with  lower  bounds.  Some  of  the  lower  bounds 
are  new;  to  maintain  continuity,  their  proofs  are  deferred  to  Section  8. 

The  first  two  problems,  concerning  area-efficient  layouts  and  minimax  edge  length  layouts, 
were  already  addressed  directly  in  the  previous  section. 

Problem  1.  Given  a  graph  G,  produce  an  area- efficient  layout  for  G. 

By  Theorem  7  in  Section  5.3,  every  jV-node  graph  with  an  (F,  \/2)-bifurcator  can  be  embedded 
in  the  truncated  tree  of  meshes  7o(F),2iogf-  Next,  by  Theorem  9  in  Section  5.4,  the  truncated 
tree  of  meshes  can  be  laid  out  in  0(F2  log2  area.  Therefore,  every  N-node  graph  with  an 
(F,  v^)-bifurcator  can  be  laid  out  in  0(F2  log2  area. 

As  a  simple  consequence  of  Lemma  4,  every  N-node  graph  whose  smallest  \/2-bifurcator  is 
F,  must  occupy  at  least  F2  area.  For  otherwise  the  graph  would  have  a  \/2-bifurcator  strictly 
smaller  than  F.  Therefore,  for  every  graph  the  upper  bound  is  at  most  a  factor  of  0(log2 
worse  than  optimal.  As  we  shall  see  in  Section  8,  the  upper  bound  is  also  existentially  optimal 
in  that  there  are  N-node  graphs  with  (F,  \/2)-bifurcators  for  all  N  and  F  with  minimum  area 
n(F2  log2  £). 
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Special  Cases.  Graphs  with  (F,  \/2)-bifurcators  with  either  of  the  special  forms  described  in 
Section  5.1.1  have  0(F2)-area  layouts. 

Problem  2.  Given  a  graph  G,  produce  an  area-efficient  layout  for  G  with  mimma'Stdae  length. 

From  Theorem  8  we  know  that  every  N-node  graph  with  an  (F,  v^2)-bifurcator  can  be 
embedded  in  the  truncated  tree  of  meshes  FO(F)i2i0gf  so  that  no  edge  passes  through  more 
than  a  constant  number  of  intermediate  meshes.  Furthermore,  the  layout  for  the  truncated  tree 
of  meshes  given  in  Theorem  10  guarantees  that  every  edge  between  meshes  has  length  bounded  by 
0[F  log  £/ log  log  y),  and  that  every  edge  within  a  mesh  has  length  one.  Combining  these  two 
theorems,  we  see  that  every  AT-node  graph  with  an  (F,  \/2)-bifurcator  has  an  0(F2  log2  j~)-area 
layout  with  maximum  edge  length  bounded  by  0(Flog  y/  log  log  !y). 

This  bound  is  also  existentially  optimal,  as  will  be  seen  in  Section  8.  However,  the  bounds  are 
not  guaranteed  to  be  universally  close.  The  only  general  lower  bound  on  minimax  edge  length 
for  .Y-node  graphs  whose  minimum  \/2-bifurcator  is  F,  is  fi {F2JN).  (This  lower  bound  is  also 
existentially  optimal,  as  will  be  shown  in  Section  8.) 

The  problem  of  minimizing  maximum  edge  length  appears  to  quite  difficult.  Although  the 
preceding  bounds  are  disappointingly  weak,  they  are  the  best  known.  Bhatt  and  Cosmadakis 
[2]  show  that  even  determining  if  a  tree  can  be  laid  out  with  minimax  edge  length  one,  is  NP- 
complete. 

Special  Cases.  The  minimax  edge  length  bounds  for  graphs  with  special  (F,  v^2)-bifurcators  are 
0[\/N /  logiV)  for  type  A  \/2-bifurcators  and  0(F)  for  type  B  v/2-bifurcators. 

Problem  3.  Given  a  graph,  produce  an  area-efficient  layout  in  which  each  wire  has  bounded 
delay  in  the  capacitive  model. 

First  we  formalize  some  details  of  the  model.  As  usual,  a  graph  describes  a  connection  of 
processors,  with  an  edge  corresponding  to  a  bidirectional  link  between  two  processors.  Each 
node  is  a  processing  element  which  contains  one  driver  and  one  receiver  for  each  incident  edge. 
Every  transistor  in  a  processing  element  has  the  same  size.  Thus,  in  our  layouts,  a  node  may  be 
represented  by  a  long  and  skinny  box  of  constant  thickness,  with  length  equal  to  the  area  of  an 
internal  transistor.  Since  each  node  has  bounded  degree,  a  box  will  be  just  big  enough  to  contain 
all  the  transistors  in  the  corresponding  processor.  Note  that  different  nodes  in  the  layout  will 
have  different  lengths,  but  the  same  thickness.  We  assume  that  the  grid  spacing  is  adjusted  so 
that  nodes  and  edges  have  unit  thickness  and  may  be  laid  along  grid  lines.  Although  wires  are 
allowed  to  cross,  we  will  not  allow  nodes  to  cross;  this  corresponds  to  transistors  not  overlapping. 
Similarly,  wires  and  nodes  may  not  cross.  The  propagation  delay  over  a  wire  of  length  l  driven 
by  a  transistor  of  area  D  with  capacitive  load  A  is  proportional  to  ( l-\-A)/D .  The  capacitive  load 
presented  to  a  transistor  equals  the  sum  of  incident  wire  lengths  and  areas  of  adjacent  transistors. 

Theorem  11.  Every  N-node  graph  G  with  an  (F,  \/2)-bifurcator  has  a  bounded-delay  layout  of 
area  0(F 2  log2  ^). 


Figure  9.  Laying  out  expanded  nodes  in  a  mesh. 


Proof.  As  in  Theorem  8  of  Section  5.4,  embed  G  in  a  tree  of  meshes  so  that  adjacent  nodes 
are  mapped  to  meshes  no  more  than  a  constant  number  of  levels  apart.  Since  the  dimensions 
of  meshes  at  successive  levels,  as  well  as  the  lengths  of  edges  connecting  adjacent  meshes  in  the 
layout  of  Theorem  9,  decrease  at  the  same  geometric  rate,  we  know  that  the  length  of  an  edge  of 
G  is  proportional  to  the  side  lengths  of  the  meshes  that  contain  the  corresponding  nodes.  Assign 
to  each  node  an  area  that  is  proportional  to  the  side  lengths  of  the  mesh  in  which  it  is  embedded. 
Thus,  the  capacitive  load  on  any  node,  which  equals  the  sum  of  the  areas  of  all  the  incident  edges 
and  adjacent  nodes,  is  proportional  to  the  area  of  the  node.  In  other  words,  every  wire  in  the 
layout  has  bounded  delay. 

We  need  to  ensure  that  each  enlarged  node  can  be  accomodated  in  its  assigned  mesh  without 
blowing  up  the  area  of  the  layout  by  more  than  a  constant  factor.  This  can  be  done  by  increasing 
the  dimensions  of  each  mesh  by  a  constant  factor,  and  laying  out  the  nodes  and  incident  edges 
as  shown  in  Figure  9.  Notice  that  the  nodes  do  not  overlap  other  nodes  or  wires.  The  area  of 
each  node  remains  proportional  to  the  side  lengths  of  the  mesh  containing  it,  and  thus  the  delay 
across  every  wire  is  bounded.  I 

Special  Cases.  Similarly,  graphs  with  special  (F,  >/2)-bifurcators  have  0(F2)-area  bounded-delay 
layouts. 

Theorem  11  means  that  the  area  bounds  for  bounded-delay  layouts  are  no  worse  than  the 
best  known  general  area  bounds  described  for  Problem  1.  However,  it  is  not  known  whether  or 
not  there  exists  a  graph  for  which  any  bounded-delay  layout  requires  asymptotically  greater  area 
than  the  minimum  area  layout.  In  the  following  corollary,  we  show  that  the  required  increase  in 
area  is  not  very  large. 

Corollary  12.  Any  layout  of  area  A  for  an  N-node  graph  can  be  transformed  into  a  bounded- 
delay  layout  of  area  0(A  log2  ^). 
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Proof.  By  Lemma  4  of  Section  5.1,  every  graph  with  a  layout  of  area  A  has  a  (Va,  >/2)- 
bifurcator  which  can  be  quickly  found.  Then  by  Theorem  11,  we  can  construct  a  bounded-delay 
layout  with  area  0(Alog2  ^).  I 

Remark.  L'nltke  the  previous  area  bounds  which  can  be  obtained  only  when  the  bifurcator  for  a 
graph  is  already  known,  the  preceding  corollary  for  transforming  a  layout  into  a  bounded-delay 
layout  can  be  efficiently  implemented. 

Problem  4.  Given  a  graph  G,  produce  a  layout  for  G  with  few  wire  crossings. 

The  layouts  for  the  truncated  tree  of  meshes  in  Theorems  9  and  10  do  not  have  any  edge 
crossings.  Since  every  Ar-node  graph  G  with  an  (F,  \/2)-bifurcator  can  be  embedded  within  the 
truncated  tree  of  meshes  T0(F),2\oi  f.,  this  means  that  the  number  of  crossings  in  the  layout  for 
G  cannot  exceed  the  number  of  nodes  in  2]og  In  other  words,  the  number  of  crossings 

in  the  layout  for  G  is  bounded  by  0(F2  log  ££). 

In  Section  8  we  will  see  that  this  bound  too  is  existentially  optimal.  We  will  also  show  that 
for  every  .Y-node  graph  with  a  minimum  \/2-bifurcator  of  size  F,  the  number  of  crossings  plus 
the  number  of  nodes  is  at  least  Q(F2).  Thus,  if  F  is  asymptotically  greater  than  \/~N,  the  number 
of  crossings  in  the  layout  for  G  is  no  worse  than  a  factor  0(log  y)  times  optimal. 

Special  Cases.  Graphs  with  special  (F,  \/2)-bifurcators  can  be  laid  out  with  0(F 2)  crossings. 

Problem  5.  Given  a  graph,  produce  an  area-efficient  regular  layout  for  the  graph. 

In  Theorem  7,  we  showed  how  to  embed  any  N-node  graph  G  with  an  (F,  >/2)- bifurcator  in 
TcF  2\og  >:  for  some  constant  c.  Moreover,  the  nodes  of  G  were  divided  evenly  among  the  N2/F 2 
bottom-level  meshes  of  TcF  21og  ^  and  in  each  bottom-level  mesh,  the  nodes  of  G  were  embedded 
in  a  regular  fashion.  Thus  to  produce  an  0{F2  log2  ^)-area  layout  for  G  that  is  regular,  we  need 
only  produce  a  layout  for  Tcf  21og  s  for  which  the  nodes  at  the  (2  log  ^)th  level  are  located  in  a 
regular  fashion.  In  fact,  we  can  do  much  better,  as  we  show  in  the  following  theorem. 

Theorem  13.  The  truncated  tree  of  meshes  r2jog  can  be  laid  out  in  0[F2  log2  ly)  area 
so  that,  for  every  level  i,  all  nodes  within  ith  level  meshes  are  placed  in  a  regular  fashion. 

Proof.  The  first  step  is  to  construct  a  ©(log  ^)-layer  three-dimensional  layout  [23]  of  the 
truncated  tree  of  meshes.  Fold  the  connections  between  the  root  of  the  tree  of  meshes  and  each 
of  its  two  sons  so  that  the  sons  fit  naturally  on  a  second  layer  over  the  root  mesh.  Fold  the 
connections  to  each  of  the  meshes  at  the  next  lower  level  so  they  fit,  on  the  third  layer,  directly 
over  the  meshes  on  the  second  layer,  and  so  forth.  This  generates  a  log  ^-layer  three-dimensional 
layout,  with  each  layer  occupying  linear  area.  By  projecting  the  three-dimensional  layout  onto 
the  plane  in  the  manner  of  Thompson  [42,  pp.  36-38],  the  result  follows.  (The  same  layout  can 
be  constructed  by  interleaving  the  meshes  at  each  level.)  I 

Special  Cases.  The  0(F2)-area  layouts  for  graphs  with  special  v/2-bifurcators  are  also  regular. 
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Problem  6.  Design  area-efficient  chips  that  can  be  configured  to  realize  a  targe  number  of  graphs. 


In  section  5.3  we  showed  that  every  N-node  graph  with  an  (F,  \/2)-bifurcator  can  be  embedded 
in  a  truncated  tree  of  meshes  such  that  the  nodes  of  the  graph  are  embedded  in  a  regular  fashion  in 
the  bottom-level  meshes  of  TcF  ,  -  .  In  fact,  the  nodes  can  be  mapped  to  fixed  positions  within 
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the  meshes.  Therefore,  if  we  lay  out  the  truncated  tree  of  meshes  on  a  chip  with  processors  at 
these  fixed  positions,  we  have  a  configurable  chip  for  all  graphs  with  the  corresponding  bifurcator. 
This  yields  the  following  result.  Observe  that  the  area  bounds  for  configurable  layouts  are  the 
same  as  for  unrestricted  layouts. 


Theorem  14.  Every  .X  node  graph  with  an  (F,  \/2)-bifurcator  has  a  configurable  layout  of  area 
0(F2  log2  Jr). 


Proof.  Simply  make  the  connections  in  the  meshes  after  the  rest  of  the  chip  has  been 
fabricated.  Recall  that  we  used  the  meshes  as  crossbar  switches  in  Theorem  7.  | 


Special  Cases.  Similarly,  graphs  with  special  bifurcators  have  0(F2)- area  configurable  layouts. 


Problem  7.  On  a  wafer  which  has  arbitrarily  distributed  defective  cells,  realize  a  given  graph 
on  the  good  cells. 

In  Section  5.3  (Theorem  7)  we  showed  how  to  embed  any  .V-node  graph  G  with  an  (F,y/ 2)- 
bifurcator  in  the  truncated  tree  of  meshes  T0{F\2  log  '  •  The  embedding  had  the  property  that 
nodes  of  the  graph  could  be  mapped  to  fixed  positions  within  the  meshes  at  the  bottom  level. 
Accordingly,  we  fixed  processors  at  each  of  these  positions. 

Faulty  processors  on  a  wafer  therefore  correspond  to  faulty  processors  in  the  truncated  tree 
of  meshes,  the  correspondence  being  induced  via  the  layout  for  the  tree  of  meshes.  It  is  clearly 
no  longer  possible  to  realize  G  in  the  faulty  tree  of  meshes.  However,  it  is  possible  to  realize  a 
smaller  graph  with  a  similar  structure  using  only  the  functioning  processors. 

More  formally,  consider  a  class  of  graphs  for  which  any  .V-node  graph  in  the  class  has  a 
\/2-bifurcator  of  size  0(/(A'))  where  the  function  /  is  such  that  f(z)/y/x  is  nondecreasing  for 
increasing  z.  For  example,  /(x)  =  \fz  for  the  class  of  square  meshes  (as  well  as  for  the  class  of 
trees  or  the  class  of  planar  graphs).  In  what  follows,  we  will  show  how  to  embed  any  Af-node 
graph  from  the  class  in  any  Tc -(jv),2iog  -n—  that  has  M  functioning  processors  where  N  >  M 
and  c  is  a  sufficiently  large  constant.  In  particular,  we  will  show  how  to  embed  T^M->  21og 
in  the  structure.  By  the  results  in  Section  5.3  of  the  paper,  this  will  be  sufficient  to  prove  the 
claim.  Thus  the  layout  strategy  developed  in  Section  5  is  impervious  to  the  existence  of  faulty 
processors.  This  result  substantially  generalizes  and  simplifies  a  similar  result  proved  by  Leighton 
and  Leiscrson  for  embedding  meshes  around  faults  in  [22]. 

Theorem  15.  Given  the  preceding  constraints  on  N ,  M,  c  and  f,  a  completely  functioning  trun¬ 
cated  tree  of  meshes  2  log  w  with  M  processors  can  be  embedded  in  any  partially  functioning 
truncated  tree  of  meshes  TC^N^  2 log  wl^  N  processors  (M  of  which  are  functioning)  so  that 
the  processors  of  the  former  are  mapped  onto  the  functioning  processors  of  the  latter. 


Proof.  Label  the  functioning  processors  in  each  tree  of  meshes  from  1  to  M  by  counting  from 
left  to  right  across  the  bottom  level  of  each  graph.  (Recall  that  the  processors  are  evenly  dis¬ 
tributed  on  the  bottom  level.)  Map  the  fcth  processor  of  T’/jm)  2iog  onl°  ^th  functioning 
processor  of  Tc^N^2log_^.  Route  the  edges  of  the  former  graph  through  the  meshes  of  the 
latter  in  the  usual  way,  at  the  same  time  embedding  meshes  of  the  former  in  blocks  within  the 
meshes  of  the  latter. 

It  remains  to  show  that  the  capacity  of  each  mesh  in  Tcj(N)  2  i0g  yfo  ’s  sufficient  for  the 
embedding.  Consider  a  mesh  A  on  the  ith  level  of  Tcj(N)2iog  This  mesh  has  side  lengths 

cf(N)/21/2  and  at  most  N /2‘  functioning  processors  below  it  in  the  bottom  level  of  the  graph. 
The  only  meshes  and  edges  of  T^M^2iog  1  that  are  embedded  in  A'  are  those  that  correspond  to 
roots  of  the  forest  of  complete  binary  trees  formed  by  removing  the  corresponding  interval  of  (at 
most  .V/2’)  processors  in  T^M^2  log  ^ i  .  These  roots  are  identified  by  splitting  T j(M),2\og  yfa 
(as  in  Lemma  3)  at  the  two  endpoints  of  the  interval.  There  are  at  most  two  roots  at  each 
level  in  the  resulting  forest  and  the  sum  of  their  side  lengths  (a  geometrically  decreasing  sum) 
is  proportional  to  f[M)/2^2  where  j  is  such  that  A//2;  <  X/2'.  (Remember  that  there  are  at 
most  .Y/2*  processors  in  the  leaves  of  the  forest  so  that  the  height  of  the  largest  complete  binary 
tree  in  the  forest  is  j  where  M /2]  <  N/ 2‘.)  Thus  the  sum  of  the  side  lengths  of  the  meshes 

embedded  in  A'  is  which,  for  sufficiently  large  c,  is  less  than  cf(N)/2 'l2  (this  is 

the  side  length  of  A),  since  N  >  M  and  f(x)/y/x  is  a  nondecreasing  function.  Hence  A  is  large 
enough  and  the  embedding  is  possible.  I 

Special  Cases.  A  similar  argument  works  for  graphs  with  special  bifurcators. 


Problem  8.  Given  a  graph  G,  assemble  G  using  the  minimum  number  o f  copies  of  a  single  chip 
having  few  external  pin  connections. 

Suppose  that  we  wish  to  assemble  iV-node  graphs  with  (F,  v^2)-bifurcators  but  that  each  chip 
contains  only  m  nodes,  where  m  <  N .  Consider  a  chip  consisting  of  a  truncated  tree  of  meshes 
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,  with  the  m  processors  divided  equally  among  the  bottom-level  meshes,  and 


external  pin  connections  to  the  top  of  the  top  level  mesh.  Two  copies  of  this  chip  may  be  wired 
together  to  form  a  truncated  tree  of  meshes  with  2m  processors.  Thus,  graphs  with  twice  as 
many  processors  can  be  assembled  with  two  chips  than  can  be  assembled  on  a  single  chip.  More 
generally,  we  have  the  following  result. 

Theorem  1G.  There  is  a  universal  restructurable  chip  with  m  processors  and  0{  external 

pins,  occupying  area  0(^p  log2  — ),  such  that  every  N-node  graph  with  an  ( F,\/2)-bifurcator 
can  be  assembled  using  multiple  copies  of  the  universal  chip.  Furthermore,  the  number  of  chips 
used  in  the  assembly  is  as  small  as  possible. 

Proof.  Consider  the  top  logN  —  logm  levels  of  a  fully  balanced  decomposition  tree  of  G. 
Each  of  the  subgraphs  at  level  logN  —  logm  has  N /2le>i  N-|o8m  =  m  nodes,  and  has  a  \p2- 
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bifurcator  of  size  0(  '^r  ).  By  Theorem  7,  each  of  these  subgraphs  can  be  realized  with  a 
single  universal  chip  consisting  of  a  truncated  tree  of  meshes  ^  C(ios  Sf* )  whose  area  is 

bounded  by  J(£^p  log2  and  which  has  0(^=^)  external  pin  connections.  To  complete 

the  assembly,  the  chips  are  wired  up  by  making  connections  between  pins  on  different  chips  as 
given  by  the  decomposition  tree.  | 

A  noteworthy  consequence  of  this  result  is  that  when  F  =  0(>/N)t  the  restructurable  chip 
has  0(\/m)  pins,  which  is  independent  of  the  size  of  the  network  to  be  assembled.  This  is  the 
best  possible.  To  realize  networks  with  larger  bifurcators,  the  parameters  of  the  restructurable 
chip  depend  on  the  size  of  the  network  assembled. 

Special  Cases.  For  graphs  with  special  bifurcators,  the  same  is  true  except  that  only  0(F2)  area 
is  used  on  each  chip.  For  type  A  \/2- bifurcators,  the  number  of  pins  needed  is  much  lower.  For 
example,  iV-node  trees  require  only  O(logm)  pins  per  chip  [4].  (As  is  the  case  for  all  planar 
graphs,  the  number  of  pins  does  not  depend  on  the  number  of  nodes.  This  is  because  .V-node 
planar  graphs  have  \/2-bifurcators  of  size  0(v/N}.)  Recently,  we  improved  this  result  to  6  pins 
for  trees  by  using  slightly  different  techniques  (but  by  giving  up  the  use  of  a  small  portion  of  the 
processors  on  some  chips).  Hence,  pin  count  constraints  place  no  limit  at  all  on  the  size  of  trees 
that  can  be  fabricated  with  a  single  configurable  chip,  no  matter  how  many  processors  are  placed 
on  each  chip. 


7.  Layout  Algorithms  Based  on  Graph  Bisection  Heuristics 

In  the  previous  section  we  saw  how  a  variety  of  layout  problems  could  be  efficiently  solved 
once  the  decomposition  tree  of  a  graph  was  known.  All  the  results  were  of  the  flavor:  “If  G  has 

an  (F,  \/2)- bifurcator,  then _ ”  But,  given  a  graph,  how  do  we  find  a  small  \/2-bifurcator  or  a 

suitable  decomposition  tree  for  the  graph? 

Some  graphs  are  easy  to  decompose,  so  that  a  small  bifurcator  can  be  found  relatively 
easily.  Such  graphs  include  trees,  cube-connected  cycles,  and,  more  generally,  graphs  that  are 
constructed  recursively.  It  is  also  easy  to  find  a  small  bifurcator  if  a  small-area  layout  is  known. 
(From  Lemma  4,  recall  that  graphs  with  layout  area  A  have  a  (VA,  \/2)-bifurcator.) 

In  general  however,  it  is  extremely  difficult  to  find  small  bifurcators  for  graphs.  The  reason 
is  that  the  process  of  graph  decomposition  involves  the  problem  of  graph  partitioning,  or  graph 
bisection.  The  graph  bisection  problem,  also  known  as  the  “min-cut”  problem,  requires  a  graph 
to  be  partitioned  into  two  components  of  equal  size,  removing  the  minimum  possible  number  of 
edges.  This  problem  is  known  to  be  NP-complete  [13). 

There  are,  however,  a  large  number  of  heuristics  for  bisecting  graphs  which  appear  to  perform 
well  in  practice  (6,  7,  10,  18,  37,  40].  Many  automated  layout  systems  use  these  and  other 
partitioning  heuristics.  Is  there  any  theoretical  justification  for  this?  In  what  follows,  we  answer 
affirmatively  by  showing  that  a  provably  good  algorithm  for  graph  bisection  can  be  tailored  into 
a  provably  good  layout  algorithm. 

The  key  idea  is  to  convert  a  bisection  width  heuristic  into  a  heuristic  for  drawing  graphs 
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with  few  crossings.  (Determining  the  crossing  number  is  also  NP-Complete  (14).)  Like  small-area 
layouts,  such  drawings  can  be  used  to  find  small  \/2-bifurcators.  The  following  theorem  shows 
that  with  a  provably  good  bisection  heuristic,  the  number  of  crossings  is  provably  small  (i.e., 
within  guaranteed  bounds  from  optimal). 

Theorem  17.  Suppose  there  is  an  algorithm  which,  for  every  N -node  graph  with  bisection  width 
B,  finds  a  bisection  of  size  at  most  7{N)B  m  polynomial  time.  (~i[N)  is  some  nondecreasing 
functional  measure  of  error.)  Then  there  is  a  polynomial  time  algorithm  which,  for  every  N- 
node  graph  with  crossing  number  C,  produces  a  drawing  with  at  most  0({C  +  N)-)2(N)  log2  N ) 
crossings. 

Proof.  Use  the  bisection  width  algorithm  to  produce  a  decomposition  tree  for  G  by  recursively 
bisecting  each  subgraph  in  the  tree.  As  in  Figure  4,  define  Gw o  and  to  be  the  left  and  right 
sons  of  Gw  in  the  decomposition  tree.  Further  define  Bw  to  be  the  bisection  width  of  Gw,  Cw  to 
be  the  crossing  number  of  Gw  and  Nw  to  be  the  number  of  nodes  in  G^.  Clearly,  =  N/ 2'"L 
A  simple  application  of  the  planar  separator  theorem  shows  that  C  -f-  N  >  f?(B2)  for  any  graph 
and  thus  Cw  +  Nw  >  0(0  for  every  w  [19,  20].  Since  Gw  contains  Gww>  for  every  w1,  we  also 
know  that  Cw  >  CWW'  and  thus  that  Cw  -f  Nw  >  fl(Blw,)  for  every  w'. 

The  algorithm  for  drawing  G  is  recursive.  At  each  step,  we  will  use  drawings  of  Gw0  and  Gw i 
to  construct  a  drawing  of  Gw.  In  addition,  we  will  store  a  path  from  each  node  to  the  exterior 
face  of  the  drawing  which  has  a  small  number  of  crossings.  These  paths  are  used  when  inserting 
edges  at  each  recursive  step,  but  are  otherwise  only  remembered  and  updated  (i.e.,  they  do  not 
count  in  the  crossing  totals).  Let  C  be  the  number  of  crossings  in  the  constructed  drawing 
of  Gw  and  let  Pw  be  the  maximum  number  of  edges  that  would  have  to  be  crossed  to  draw  an 
edge  from  any  node  in  the  constructed  drawing  of  Gw  to  the  exterior  of  the  drawing.  Using  a 
straightforward  divide-and-conquer  analysis  similar  to  that  used  to  prove  Theorem  7-8  of  [19], 
we  can  see  that 

C  <  Co  +  Cl  +  7 2(N)Bl  +  '1(N)BV)PW 

and 

Pw  <  max(PU)0,Fu,i)  +  7(iV)Bw 

for  every  w.  Solving  the  latter  recurrence,  we  find  that 

Pw  <  7(TV)max(Bu,u,i)logAfw 

in' 


and  thus  that 

C  <  Co  +  C'wi  +  0(72(W)(C„  +  AT*)  log  AT*). 

It  is  now  a  straightforward  matter  to  prove  by  induction  on  (  w  |  (starting  with  |  w  |=  logfV  and 
decreasing)  that 

C  <  Otf{N)( Cw  +  Afw)Iog2C), 

thus  proving  the  theorem.  | 


As  a  consequence  of  Theorem  17,  we  can  prove  the  following  result  on  finding  \/2-bifurcatorB. 
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Theorem  18.  If  there  exists  a  polynomial  time  algorithm  which  finds  a  r)(N)B-bisection  of 
any  N -node  graph  with  bisection  width  B,  then  there  is  a  polynomial  time  algorithm  for  finding  a 
(p(N)F ,  \/2)-bifurcator  for  any  graph  G  where  F  is  the  size  of  the  minimum  \f2-bifurcator  for  G 
and  p{N)  =  0{1{N)\og3/2  N). 

Proof.  First  use  Theorem  17  to  construct  a  drawing  for  G  with  C'  =  0{^2{N)  log2  N(C  +  N)) 
crossings  where  C  is  the  minimal  crossing  number  of  G.  In  what  follows,  we  show  how  this 
drawing  can  be  used  to  construct  a  \/2-bifurcator  for  G  of  size  0{-i{N )  log  N\/U~+7J)- 

Consider  the  graph  G'  formed  by  replacing  the  C'  edge  crossings  in  the  drawing  of  G  with 
artificial  nodes.  This  graph  is  planar  and  has  M  =  N  +  C'  nodes.  By  the  Lipton-Tarjan 
planar  separator  theorem  [29],  we  can  conclude  that  G'  has  a  %/2-bifurcator  of  size  o(Vm)  = 
0(y/N  +  C').  Thus  G  has  a  \/2-bifurcator  of  size  0(\/N  C ')  =  0(^{N)  log  N\JC  -f-  N). 

By  the  optimality  of  C  and  the  solution  to  Problem  4  in  Section  6,  we  know  that  C  + 
N  <  0(F2  log  ££)  where  F  is  the  size  of  the  minimal  V^-bifurcator  of  G.  Hence,  we  have 

constructed  a  v^-bifurcator  for  G  of  size  0^'y(N)(log  N)  F \J\og  ^  =  p(Ar)F  where  p(N)  = 

O^N)  log3/2  N).  I 

Although  Theorem  18  can  be  easily  applied  to  the  layout  area  problem,  better  bisection- 
width-based  bounds  can  be  derived  directly  from  Theorem  17.  These  bounds  are  stated  in  the 
following  theorem. 

Theorem  19.  If  there  exists  a  polynomial  time  algorithm  that  finds  a  j(N)B-bisection  for  any 
N -node  graph  with  bisection  width  B,  then  there  exists  a  polynomial  time  algorithm  that  produces 
a  layout  for  any  N -node  graph  G  with  area  at  most  ip(N)A  where  A  is  the  minimum  layout  area 
of  G  and  iii(N)  =  0{^{N)  log4  N). 

Proof.  First  use  the  algorithm  described  in  Theorem  17  to  find  a  drawing  for  G  with  at  most 
0(iV)(C  +  N)  crossings  where  C  is  the  crossing  number  of  G  and  <f>(N )  =  0('j2(N)  log2  N). 
Convert  the  drawing  into  a  planar  graph  by  replacing  each  crossing  with  an  artificial  node  as  in 
Theorem  18.  Using  the  algorithm  developed  by  Leiserson  [26]  and  Valiant  [45],  this  graph  can  be 
laid  out  using  at  most  0(tf>(iV)(C  +  N)  log2  N)  area.  The  construction  is  completed  by  replacing 
the  artificial  nodes  with  their  original  edge  crossings.  Since  A  >  C  +  N,  it  is  clear  that  the 
layout  has  area  at  most  ip(N)A  where  ip(N)  =  0(72(JV)log4  N).  I 


8.  Area,  Crossing  Number  and  Edge  Length  Bounds 

In  Section  6,  we  argued  that  the  new  framework  is  universally  good  in  the  sense  that  no  graph 
with  an  (F,  \/2)-bifurcator  has  a  much  better  layout  than  that  provided  by  the  framework.  In 
this  section,  we  show  that  the  framework  is  existentially  optimal  inasmuch  as  there  exist  graphs 
with  (F,  \/2)-bifurcators  that  are  laid  out  optimally  by  the  framework. 


8.1.  Universal  Bounds 


In  the  following  theorem,  we  characterize  the  layout  area,  crossing  number  and  minimax  edge 
length  of  a  graph  in  terms  of  its  minimal  \/2-bifurcator.  Most  of  the  bounds  have  already  been 
proved  but  we  state  them  together  again  for  convenience. 

Theorem  20.  Let  F  be  the  minimum  y/2-bifunator  of  an  N-node  graph  G,  which  has  minimum 
layout  area  A,  minimal  edge  length  L,  and  crossing  number  C.  The  following  inequalities  hold, 
and  the  upper  bounds  can  all  be  realized  simultaneously. 

f3  <a<o(f2  \og2 

n(F2)  <  C  +  N  <  C^F2  log  J 
and 

/N)<L<  o(f  log  ^ /log  log 


n(F2 


Proof.  The  upper  bounds  were  proved  in  the  solutions  to  Problems  1,  2  and  4  in  Section  6. 
Note  that  the  bounds  are  all  realized  for  the  same  layout. 

The  area  lower  bound  is  from  Lemma  4.  The  crossing  number  lower  bound  follows  from 
the  analysis  in  Theorem  18.  In  particular,  any  N-node  graph  with  crossing  number  C  has  a  y/2- 
bifurcator  of  size  0(y/N  -+-  C).  The  edge  length  lower  bound  follows  from  the  crossing  number 
lower  bound.  Since  C  +  N  >  fi(F2),  the  wire  area  of  the  layout  is  at  least  that  large  and  thus 
at  least  one  of  the  0(N)  wires  in  the  network  must  have  length  n (F2/N).  (In  fact,  the  average 
edge  length  is  fi(F2/N).)  | 

As  we  have  noted  throughout  the  paper,  it  is  possible  to  improve  the  upper  bounds  in  Theorem 
20  for  special  classes  of  graphs.  As  we  show  in  the  next  section  however,  such  improvements  are 
not  always  possible. 

8.2.  Existential  Bounds 

We  next  show  that  the  universal  upper  and  lower  bounds  given  in  Theorem  20  are  everywhere 
existentially  tight.  We  first  define  the  expander-connected  mesh  and  show  that  it  achieves 
(simultaneously)  the  universal  lower  bounds  on  area,  crossing  number  and  edge  length  for  any 
N  and  F.  Then  we  define  the  expander-connected  mesh  of  trees  and  show  that  it  attains  the 
corresponding  universal  upper  bounds. 

nn  expander-connected  mesh  Pm,n  with  N  =  mn2  nodes  is  formed  by  superimposing  n2 
copies  of  an  m-node  expander  graph  on  m  copies  of  an  n  X  n  mesh.  More  precisely,  define  Pmn 
to  be  the  graph  consisting  of  m  disjoint  n-by-n  meshes  which  are  interlinked  with  additional 


32 


Figure  10.  The  expander- connected  mesh  P2,t- 


edges  so  that  for  each  i  and  j  (1  <  i,j  <  n),  the  subgraph  induced  on  the  m  nodes  which  are  in 
the  (i,j)  position  of  some  mesh  is  an  expander  graph.  For  example,  is  shown  in  Figure  10. 
The  dotted  lines  represent  edges  in  the  expander  graphs  while  the  solid  lines  represent  edges  in 
the  meshes. 

Remark.  Strictly  speaking,  the  expander-connected  mesh  has  node  degree  7  and  does  not  fit  into 
our  layout  model.  This  problem  can  be  dealt  with  in  a  variety  of  ways  but  the  simplest  i9  to 
rep!  ce  each  degree  7  node  with  a  7-leaf  binary  tree.  The  area,  crossing  number  and  minimax 
edge  length  bounds  for  the  resulting  degree  3  graph  differ  by  at  most  a  constant  factor  from  those 
derived  below  for  the  unaltered  graph.  A  similar  fact  is  also  true  for  the  expander-connected  mesh 
of  trees. 

In  the  following  we  show  that  the  size  of  the  smallest  >/2-bifurc?tor  of  Pm,„  is  at  least  fi(mn). 
This  is  accomplished  using  the  lower  bound  techniques  developed  in  [19,  20]  to  prove  that  the 
bisection  width  of  Pm,n  is  at  least  O(mn).  This  means  that  the  smallest  V^-bifurcator  for  Pm,« 
has  size  fl(mn). 

Lemma  21.  The  bisection  width  of  Pm,n  is  at  least  Cl(mn). 

Proof.  Let  (i,j,  k)  denote  the  (i,j)  node  of  the  frth  mesh  of  Pm,n.  In  addition,  let  P'm>n  denote 
the  graph  formed  by  extending  each  expander  graph  of  Pm,n  to  a  complete  graph  (i.e.,  to  the 
graph  formed  by  inserting  edges  between  nodes  [i,j,k)  and  ( i,j,k ')  for  every  1  <  i,j  <  n  and 
1  <  k,k'  <  m).  In  what  follows,  we  will  use  the  methods  of  [19,  20]  to  find  a  lower  bound  on 
the  bisection  width  of  P'm  n«  This,  in  turn,  will  be  used  to  find  a  lower  bound  on  the  bisection 
width  of  Pmi„. 


iii 

i 


33 


Consider  the  embedding  of  the  complete  graph  on  mn2  nodes  (Kmnt)  in  P'mn  which  links 
node  (i,j,k)  to  node  ( ')  via  the  path 

(l.j>  k)  —  (i  ±  1  ,j,  k)  —  (i  ±2 ,j,k)~* - ♦  k) 

-  (»'.  j  ±  1,  k)  ->  [f,j  ±2  (i'.f,  k) 

fc')- 

(Note  that  the  notion  of  an  embedding  used  here  is  different  than  that  defined  in  Section  2,  where 
edges  were  mapped  to  edge-disjoint  paths  in  the  grid.) 

A  simple  counting  argument  reveals  that  each  mesh  edge  of  P'm  n  is  utilized  at  most  0(mn3) 
times  by  the  embedding  of  Kmn j  while  each  complete  graph  edge  is  used  at  most  0(n2)  times. 
Since  at  least  m2n4/ 4  edges  of  Kmn2t  must  cross  any  bisection  of  /Cm  n 2,  we  can  thus  conclude 
that  any  bisection  of  P'm  n  must  cut  at  least  H(rnn)  mesh  edges  or  at  least  fi(m2n2)  complete 
graph  edges.  Clearly,  any  bisection  of  P'm  n  which  cuts  Q(mn)  mesh  edges  must  also  cut  fi(mn) 
mesh  edges  of  Fm,„.  In  what  follows,  we  will  show  that  any  bisection  of  P'm  n  which  cuts  s 
complete  graph  edges  must  cut  at  least  Q(s/m)  expander  edges  of  Pmn.  This  will  impiy  that  any 
bisection  of  P'm  n  which  cuts  fl(m2n2)  complete  graph  edges  must  cut  n(mn2}  expander  graph 
edges  of  Pmn ,  thus  completing  the  proof. 

Consider  a  bisection  of  P'm  n  which  cuts  s  complete  graph  edges.  Let  stiJ  denote  the  number 
of  edges  cut  in  the  (i,j)  complete  graph  of  P'm  n  for  1  <  i,j  <  n.  Clearly,  s  =  =  l  9,,,. 

As  each  node  in  an  m-node  complete  graph  is  incident  to  at  most  m  —  1  edges,  we  know  that 
the  bisection  of  P'm  n  divides  the  (:,  j)  complete  graph  into  two  subgraphs  which  contain  at  least 
stiJ/m  nodes  each.  Thus  at  least  f l(sM/m)  edges  of  the  (i,j)  expander  graph  of  Pmr>  are  cut  by 
the  bisection.  Summing,  we  find  that  the  bisection  cuts  at  least  ft(s/m)  expander  edges  of  Pm,n 
in  total.  | 

We  can  construct  an  expander-connected  mesh  with  N  nodes  and  minimum  v^-bifurcator  F 
for  any  N  and  F  such  that  <  F  <  O(N),  by  setting  n  =  Q(N /F)  and  m  ~  Q[F2/N). 

We  now  show  how  to  construct  a  layout  for  Pm,n  which  achieves  (up  to  a  constant)  the  universal 
lower  bounds  for  area,  crossing  number  and  minimax  edge  length  of  Theorem  20. 

Theorem  22.  There  is  a  layout  for  Pm,n  which  has  area  and  crossing  number  at  most  0(m2n2)  = 
0{F7)  and  maximum  edge  length  at  most  O(m)  =  0[F2/N). 

Proof.  Lay  out  each  expander  graph  in  an  O(m)-by-0(m)  grid  so  that  the  node  in  the  ifcth 
mesh  is  in  the  (k,  k)  position  of  the  grid.  Arrange  these  sublayouts  in  a  mesh-like  pattern  so  as 
to  be  consistent  with  the  mesh  structure  of  Pm,n-  Next  insert  the  mesh  edges  in  the  natural  way. 
The  resulting  layout  should  look  like  Figure  10.  It  is  easily  verified  that  the  area  of  this  layout 
(and  hence  its  crossing  number)  is  at  most  0(n2)  X  0(m2)  =  0(m2n2),  and  that  every  edge  has 
length  at  most  0(m).  | 

Before  defining  the  expander-connected  mesh  of  trees,  it  is  useful  to  review  the  definition  of  a 
mesh  of  trees  as  proposed  by  Leighton  in  (19,  20].  (An  equivalent  structure,  the  orthogonal  trees 
network,  has  been  studied  by  Nath,  Maheshwari  and  Bhatt  in  [33].  Cappello  and  Stieglitz  have 
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Figure  11.  The  4X4  mesh  of  trees  Mi,*. 


also  studied  this  graph,  which  they  call  the  orthogonal  forests,  in  [8].)  The  2-dimensional  mesh  of 
trees  Mi,n  (where  n  is  assumed  to  be  a  power  of  2)  is  defined  as  follows.  Starting  with  an  n  X  n 
matrix  of  nodes  and  adding  nodes  wherever  necessary,  construct  a  complete  binary  tree  in  every 
row  and  column  of  the  matrix.  The  trees  should  be  constructed  so  that 

•  the  leaves  in  each  tree  are  precisely  the  nodes  in  the  corresponding  row  or  column  of  the 
original  matrix,  and 

•  the  subgraph  induced  on  the  nodes  in  each  quadrant  is  M2_n/2. 

For  example,  we  have  drawn  M2,*  in  Figure  11.  The  nodes  in  the  original  4x4  matrix  are 
represented  by  dots.  The  nodes  which  were  added  in  order  to  form  row  trees  are  drawn  as  small 
triangles  while  those  added  to  form  column  trees  are  shown  as  small  squares.  Solid  lines  indicate 
row  tree  edges  while  dashed  lines  indicate  column  tree  edges. 

The  expander-connected  mesh  of  trees  is  similar  to  the  expander-connected  mesh  Pm,„  except 
that  the  meshes  are  replaced  by  meshes  of  trees.  More  precisely,  the  expander-connected  mesh  of 
trees  (denoted  by  Qm,„)  is  defined  to  be  the  graph  consisting  of  m  disjoint  n  X  n  meshes  of  trees 
which  are  interlinked  with  additional  edges  so  that  for  each  i  and  j  (1  <  i,j  <  n),  the  subgraph 
induced  on  those  leaves  in  the  (i,jj  position  of  some  mesh  of  trees  is  an  expander  graph.  For 
example,  we  have  drawn  Q2,2  in  Figure  12.  The  dotted  lines  represent  edges  in  the  expander 
graphs  while  the  dashed  and  solid  lines  represent  edges  in  the  meshes  of  trees. 

It  is  not  difficult  to  check  that  Qm,n  has  N  =  0(mnJ)  nodes  and  a  %/2-bifurcator  of  size  F  = 
mn.  In  the  following  theorem,  we  will  show  that  <2m,„  has  layout  area  at  least  fi(m2n2  log2  n)  = 
n(F2log2  $),  crossing  number  at  least  fi(m2n2logn)  =  n(F2log  $)  and  minimax  edge  length 
at  least  fi(mnlogrc/loglogn)  =  fi(Flog  log  log  Thus  the  universal  upper  bounds  proved 
in  Theorem  20  are  existentially  tight  for  every  N  and  F. 

Theorem  23.  The  expander-connected  mesh  of  trees  has  layout  area  0(m2n2  log2  n),  crossing 
number  0(m2n2  log  n)  and  minimax  edge  length  0(mn  log  n/ log  log  n). 
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Proof.  The  upper  bounds  follow  trivially  from  Theorem  20  and  the  fact  that  Qm,„  has  a 
\/2-bifurcator  of  size  O(mn).  The  lower  bounds  are  substantially  more  difficult.  In  fact,  we 
suggest  that  the  reader  be  familiar  with  the  lower  bound  techniques  described  in  [  19,  20j  for  the 
case  when  m  =  1  before  wading  through  the  following  proof  for  general  m.  We  commence  with 
the  area  lower  bound. 

8.2.1.  Area  Bound 

Let  Wm(n)  denote  the  minimum  wire  area  of  Qmt„.  We  will  show  that  for  a  sufficiently  small 
(but  positive)  constant  a, 

Wm(n)  >  am2n2  log2  n 

for  all  m  and  n.  This  will,  of  course,  imply  the  desired  lower  bound  for  layout  area. 

The  proof  is  by  induction  of  n.  Since  Qm,n  contains  n2  disjoint  m-node  expander  graphs,  the 
hypothesis  is  clearly  true  for  n  <  16  provided  that  q  is  a  sufficiently  small  constant.  In  what 
follows,  we  will  assume  that  the  hypothesis  is  true  for  all  values  less  than  n  in  order  to  prove  it 
for  n. 

Consider  any  layout  for  Qm,n  which  uses  Wm(n)  wire.  Partition  the  layout  into  three  vertical 
strips  Vp,  Vi  and  V2  so  that  the  center  strip  contains  7mn2/8  leaves  and  each  outer  strip  contains 
mn2/ 16  leaves.  Similarly  partition  the  layout  into  three  horizontal  strips  H o,  H\  and  Hj  so 
that  the  middle  strip  contains  7mn2/8  leaves  and  each  outer  strip  contains  mn2/l6  leaves.  For 
example,  see  Figure  13. 

Let  d  denote  the  length  of  the  longest  side  of  the  center  block  formed  by  the  intersection  of 
Vi  and  H\.  Without  loss  of  generality,  we  assume  that  the  longest  side  is  horizontal.  In  what 
follows,  we  will  show  that  d  >  ^mn  logn. 

Since  each  of  the  regions  VoPl^i  and  V2n^i  can  contain  at  most  mn2/16  leaves,  it  is  dear 
that  Vi  f)  H\  contains  at  least  3mn2/4  leaves.  Consider  the  n3/2  subgraphs  of  Qm,n  produced  by 
eliminating  the  top  j  logn  levels  of  the  row  and  column  trees  of  Qm.n ■  Each  of  these  subgraphs 


Figure  12.  The  expander- connected  mesh  of  trees  Q2|2. 


leaves  leaves  ifiaves 


Figure  13-  Partitioning  a  layout. 

is  isomorphic  to  Qm  By  the  pigeonhole  principle,  at  least  1/4  of  these  subgraphs  have  at 
least  1/2  of  their  leaves  inside  V,  f)Hj.  If  d  <  ^y/amnlogn  (otherwise,  we  are  done),  then  at 
most  Ad  <  \s/amn\ogn  edges  can  cross  the  boundary  of  V,  f)Hi.  Thus,  at  most  Jc0\/anlogn 
of  the  subgraphs  which  have  most  of  their  leaves  in  Vi  f|#i  can  have  m  or  more  nodes  or  parts 
of  edges  outside  of  Vj  fl^i-  (This  is  because  every  partition  of  Qm  ni/«  for  n  >  1G  into  two 
subsets,  each  of  which  contains  m  or  more  nodes,  requires  the  removal  of  at  least  m/co  edges 
where  Co  is  a  constant.) 

This  means  that  Vif]Hi  contains  at  least  \n 3/J  —  JcoV^Iog”  nearly  complete  copies  of 
Qm.n'i*-  Since  (by  induction),  Wm(n,/'*)  >  ^am2nI/J  log2  n,  and  since  each  nearly  complete 
copy  of  Qm  „,/<  is  missing  at  most  m  nodes  and  edges,  it  is  not  difficult  to  show  that  the  wire  area 
of  each  nearly  complete  copy  of  is  at  least  fact m2n1/2  log2  n.  Thus  Vi  fl  contains  at 

least 

(-n3,/2 - co\/anlogn)  X  — am,n1^2  log2  n 

4  4  32 

wire  area.  For  constant  a  sufficiently  small,  this  is  at  least  ^am2n2  log2  n.  Hence  d  > 
fey/amn\ogn,  as  claimed. 

We  next  use  the  layout  for  Qm,n  to  construct  a  d  awing  for  the  complete  graph  on  mnJ  nodes 
(namely,  the  mn2  leaves  of  Qm,„).  In  particular,  the  edge  from  leaf  (i,j,k)  to  leaf  [i',j',k')  is 
drawn  from  [i,j,k)  to  (»',/,  fc)  along  the  path  from  (i,j,fc)  to  (i',j,k)  in  the  jth  row  tree  of  the 
Jfcth  tree  of  meshes  and  from  '.)  to  k)  in  the  i'th  row  tree  of  the  fcth  tree  of  meshes. 

The  edge  to  {i',j',k')  is  completed  by  drawing  a  line  from  ( i',j',k )  to  ( i',j',k ')  directly.  (Notice 
that  we  have  traced  over  the  mesh  of  trees  edges  but  not  the  expander  edges.)  No  matter  how 
the  edges  are  drawn  in  the  plane,  however,  (e.g.,  they  may  cross  or  overlap)  it  is  clear  from 
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Figure  13  that  the  sum  of  the  lengths  of  the  edges  (as  measured  in  Euclidean  space)  is  at  least 
(mn2 /16)2d  >  2 ~12  \/am3ns  log  n.  This  is  due  to  the  fact  that  (mrc2/16)2  edges  pass  from  region 
V'o  to  region  V-2  and  that  these  regions  are  separated  by  a  distance  d. 

Let  L,  denote  the  sum  of  the  lengths  of  the  edges  in  the  ith  levels  of  the  binary  trees  in 
the  layout  of  Qm,„ .  In  addition,  let  R  denote  the  sum  ,  Rx, }  where  RX:]  is  the  sum  over 

1  <  k,k'  <  m  of  the  distance  between  ( i,j,k )  and  ( i,j,k ').  Each  level  i  edge  is  traced  over  at 
most  mn32_1  times  in  the  drawing  of  the  complete  graph.  In  addition,  the  straight-line  path 
between  (i,j,  k)  and  ( i,j,k ')  is  traced  over  at  most  n2  times  for  any  i,  j,  k  and  k' .  Thus, 

log  n 

Rn 2  +  ^2  E,n3m2_‘  >  2~12  s/am3ns  logn. 

1=1 

This  means  that  one  of  the  following  inequalities  must  be  true: 

R  >  2~  13\/am3n3  logn 


or 

log  n 

y;  L, 2~*  >  2_1!\/am2ns logn. 

t=i 


In  the  first  case,  we  observe  that  there  is  a  constant  cj  such  that  R'  >  ^ R  where  R'  = 
23”  -1  ^1.}  anc*  ^l,j 's  t^le  surn  the  lengths  of  the  edges  in  the  (i,j)  expander  graph  of  Qm,n- 
This  observation  follows  from  the  fact  that  R'l  }  >  ^RX/J  for  every  i  and  j.  (This  fact  can 
be  proved  by  integrating  the  values  of  Rl}]  and  R[  over  all  vertical  and  horizontal  cuts  of  the 
layout.  Each  cut  will  contain  r(m  —  r)  pieces  of  edges  of  Rt)J  and  j~r(m  —  r)  pieces  of  edges  of 
}  where  r  and  m  —  r  are  the  number  of  nodes  on  opposite  sides  of  the  cut.)  Since  Wm(n)  >  R' 
and  since  R'  >  2~13ci\/am2n3  logn,  we  can  conclude  that  (for  a  sufficiently  small  constant  a) 
Wm(n)  >  am2n2  log  n,  thus  proving  the  inductive  hypothesis. 

In  the  second  case,  we  can  show  by  a  simple  contradiction  argument  (just  plug  the  claimed 
value  back  into  the  sum)  that  there  exists  an  i  such  that 

r  ^  2~13y/am2n2  logn2* 

where  0  is  the  constant  2Z,°li  I/*8.  Using  the  straightforward  relation 

Wm(n)  >  22’Wfn(n2-)  +  L„ 


we  can  conclude  that 

.  „2i  2/  n-i\2/,  .,2  1  13 y/c*m2n2  log  n  2* 

Wm(n)  >  22  am2(n2  *)2(log  n  -  1)2  + - — j~3 - 5 - 

.  33l  3  „.  22.  1  2~l3y/am2n2  log  n  2* 

>  am  n2  log  n  —  2mm  n’  log  n  + - - 

P* 
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which  is  at  least  am2n2  log2  n  for  a  sufficiently  small  constant  a.  This  completes  the  proof  of 
the  area  lower  bound.  We  next  prove  the  minimax  wire  length  lower  bound. 

8.2.2.  Wire  Length  Bound 

From  the  proof  of  the  wire  area  lower  bound,  we  know  that  one  of  the  following  inequalities 
must  hold: 

R  >  2—  13v/Qm3n3logn,  or 

log  n 

L, 2~*  >  2  — 13 \/am2n3  log n. 

«=i 

When  the  first  inequality  holds,  we  showed  that  Wm(n)  >  fl(m2n3logn).  Since  Qm>n 
has  Q{mn2)  edges,  this  means  that  at  least  one  of  the  edges  in  the  layout  must  have  length 
ff(mnlogn)  >  f2(mn  log  n/  log  log  n).  When  the  second  inequality  holds,  a  simple  contradiction 
argument  (as  before,  just  plug  the  values  back  into  the  sum)  can  be  used  to  show  that  either 

1)  there  is  an  i  <  6  log  log  n  such  that  L,  >  Q(m2n2  log  n  2* /  log  log  n),  or 

2)  there  is  an  z  >  6  log  log  n  such  that  L,  >  fi(m2rz2  log  n  2*/:2). 

Since  there  are  mn2'~l  level  z  edges  in  Qm,n,  the  first  condition  insures  that  the  layout 

contains  a  wire  of  length  Q(mn  log  nf  log  log  n).  The  analysis  of  the  second  case  is  somewhat 

more  difficult. 

Consider  a  layout  for  Qm,n  which  achieves  the  minimax  edge  length  and  (among  layouts  which 
satisfy  this  constraint)  has  minimum  area.  Since  Wm(n)  >  L,  for  all  i,  the  second  inequality 
implies  that 

Wm[n)  >  n(m2n2  log7  n/  log  log2  n) 

>  n (m2n2  log6  n) 

for  this  layout.  Thus  (,  -ithout  loss  of  generality)  the  horizontal  length  of  the  layout  is  at  least 
fi(mn  log3  n). 

Partition  the  layout  into  three  equal-area  vertical  strips.  By  the  minimality  of  the  layout  area, 
we  can  conclude  that  each  of  the  outer  strips  contains  fl(mn  log3  n)  nodes.  (Otherwise,  a  smaller 
layout  with  identical  minimax  edge  length  could  be  constructed.) 

Since  each  mesh  of  trees  has  diameter  O(log  n),  each  mesh  of  trees  must  be  entirely  contained 
in  an  0(mn  log2  n)  by  0(mn  log2  n)  rectangle.  (Otherwise,  there  would  be  an  edge  of  length 
fl(mrzlogrz)  and  we  would  be  done.)  Thus  nodes  in  the  same  mesh  of  trees  must  be  grouped 
together  in  the  layout.  Since  each  mesh  of  trees  contains  0(n2)  nodes,  the  outer  strips  must 

contain  Q(”^^— ”)  complete  meshes  of  trees.  Thus  at  least  H(  —  -n )  >  fi(®)  nodes  of  each 
expander  graph  are  contained  in  the  left  and  right  outer  strips  of  the  layout.  Since  any  two  sets 
of  m  and  r2  nodes  are  linked  by  a  path  of  length  0(  log  ®  -flog  ^)  in  an  m-node  expander  graph, 
this  means  that  there  is  a  path  of  length  O(logn)  connecting  the  left  outer  strip  to  the  right  outer 
strip.  As  the  strips  are  separated  by  a  distance  Q(mn  log3  n),  we  can  conclude  that  the  layout 
contains  an  edge  of  length  fl(mn  log2  n).  This  completes  the  proof  of  the  minimax  wire  length 
lower  bound.  We  next  prove  the  crossing  number  lower  bound. 
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8.2.3.  Crossing  Number  Dound 


L  t  Cm(n)  denote  the  minimum  crossing  number  of  Qm,n-  As  was  the  case  with  the  wire  area 
lower  bound,  vp  will  show  by  induction  on  n  that 

Cm{n)  >  am2na  log  n 

for  a  sufficiently  small  (but  positive)  constant  r>.  The  basis  of  the  induction  follows  from  the 
fact  that  C  >  fl(f?2)  for  any  N-node  graph  with  bisection  width  B  >  >  f )(y/N).  This  fact 
immediately  implies  that  the  crossing  number  of  an  m-node  expander  graph  is  fl(m2).  In  what 
follows,  we  will  assume  that  the  hypothesis  is  true  for  all  values  less  than  n  in  order  to  prove  it 
for  n. 

Consider  a  drawing  of  in  the  plane  which  has  C^n)  crossings.  By  the  optimality  of 

Cm(n),  we  can  assume  that  no  pair  of  edges  cross  more  than  once  and  that  pairs  of  edges  incident 
to  the  same  node  do  not  cross  at  all.  Using  the  drawing  for  Qm,n,  construct  a  drawing  for  a 
graph  with  f )(m‘n4)  edges  and  mn2  nodes  as  follows. 

1.  Draw  an  edge  between  every  pair  of  nodes  in  the  same  expander  graph  which  are  incident 
to  crossing  expander  graph  edges. 

2.  Draw  an  edge  between  pairs  of  leaves  in  the  same  mesh  of  trees. 

3.  Draw  an  edge  between  pairs  of  leaves  separated  by  a  path  of  length  1  or  2  in  the  graph 
fo:  ned  by  steps  1  and  2  above. 

4.  Eliminate  multiple  edges. 

Each  edge  in  the  new  graph  should  be  drawn  along  the  edges  of  Qm,„  in  the  natural  way  (e.g., 
the  edges  introduced  in  step  1  are  drawn  along  the  corresponding  crossing  edges  of  Qm,n)-  It  is 
not  difficult  to  check  that  each  expander  edge  is  traced  over  at  most  m  times  during  step  1  and 
that  each  level  i  mesh  of  trees  edge  is  traced  over  at  most  n32~’  times  in  step  2.  These  values 
are  multiplied  by  a  factor  of  0(n2)  for  expander  edges  and  0(m)  for  mesh  of  trees  edges  by  step 
3. 

Since  every  drawing  of  an  m-node  expander  graph  has  Q (m2)  crossings,  it  is  not  difficult  to 
see  that  the  resulting  graph  (even  after  step  4)  has  E  =  Q(m2n4)  edges  and  N  =  mn3  nodes.  In 
Theorem  7-6  of  (19),  Leighton  shows  that  any  drawing  of  such  a  graph  must  have  Q(E3/N2)  = 
fi(m4n8)  crossings.  Thus 

log  n  log  n 

sm2n*  -f  ^  r,man5 2— ’  -J-  t,(Jm2n6 2~'~3  >  n(m4n®) 

«— 1  M=1 

where  is  the  number  of  crossings  in  the  drawing  of  Qm>n  involving  a  level  t  edge  and  a  level 
j  edge,  r,  is  the  number  of  crossings  involving  a  level  t  edge  and  an  expander  edge,  and  a  is 
the  number  of  crossings  involving  two  expander  edges.  This  means  that  one  of  the  following 
inequalities  must  be  true: 

a  >  n(man4), 


1 


lof  n 

^2  r,2~‘  >  n(mJns) 

»— 1 

or 

log  n 

£  >  n (mV). 

».j“i 

If  the  first  inequality  holds,  then  we  can  conclude  that 

Cm(n)  >  s  >  n(mV)  >  am2n2  log  n 

for  sufficiently  small  a.  If  the  second  inequality  holds,  then 

log  n  log  n 

Cm(n)  >  XI  r'  —  r»^_*  —  ft(mV)  >  amVlogn 

i=i  i=i 

for  sufficiently  small  a.  The  analysis  for  the  third  case  is  somewhat  more  difficult. 

Let  t,  =  f».i  be  t^ie  num^er  °f  crossings  involving  a  level  i  edge  and  a  level  j  edge 

where  j  >  i.  When  the  third  inequality  holds,  it  is  clear  that  ” ^*2  2l  >  f 2(mV).  Thus 
there  is  an  i  such  that  t,  >  fl(m2n22‘).  Using  the  inductive  hypothesis,  we  can  thus  conclude 
that 

Cm(n)  >  2J,C7m(n2-)  +  U 

>  22,am2(n2“’)2(logn  —  t)  +  t, 

=  am  V  log  n  —  tarn  V  +  n(m2r»22*) 

which  is  at  least  am2n 2  logn  for  a  sufficiently  small  constant  a.  This  concludes  the  proof  of  the 
crossing  number  lower  bound  and  of  the  theorem.  | 

9.  Remarks 

The  divide-and-conquer  strategy  based  on  bifurcators  has  also  been  successfully  applied  to 
the  study  of  three-dimensional  VLSI  layouts  [23].  In  addition,  the  techniques  and  results  are 
applicable  to  graph  and  data-structure  embeddings,  and  also  provide  bounds  on  one-  and  two- 
dimensional  bandwidth  minimization. 

There  are  a  number  of  problems  left  unresolved  in  this  paper.  Some  of  the  more  important 
ones  are  mentioned  below. 

1.  How  much  area  is  needed  to  lay  out  an  IV-node  planar  graph?  The  best  universal  upper 

bound  is  0[N  log2  N)  [26,  45]  while  the  best  existential  lower  bound  (for  the  ‘  '  of  meshes)  is 

Q(N\ogN)  [19,  20]. 

2.  Is  there  a  polynomial  time  algorithm  for  laying  out  trees  with  edges  not  much  longer  than 
the  minimax  edge  length?  The  best  tree  layout  algorithm  known  produces  layouts  with  edges  of 
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length  Q[\/N /  log  N)[3].  Although  this  is  optimal  for  some  trees,  it  is  way  off  for  others.  On  the 
other  hand,  it  is  NP- Complete  to  determine  if  a  tree  can  be  laid  out  with  all  edges  of  length  one 
[2]- 


3.  Is  there  a  better  way  to  realize  a  network  in  an  environment  that  contains  defective 
processors’  Theorem  15  guarantees  that  any  graph  can  be  realized  using  the  good  processors 
provided  the  “channels”  have  width  fi(^==log^)  in  a  regular  layout.  This  bound  is  clearly 
optimal  for  some  networks  (such  as  expander-connected  meshes  of  trees)  but  is  not  known  to  be 
optimal  for  simpler  networks.  In  particular,  it  is  not  known  whether  or  not  a  constant  number 
of  tracks  per  channel  suffices  to  configure  a  mesh  from  the  good  processors.  Since  F  =  \/N  for 
an  jV-node  mesh,  the  best  known  upper  bound  on  channel  width  is  0{\ogN). 

4.  Is  there  a  provably  good,  polynomial  time  algorithm  for  the  bisection  width  problem? 
Although  the  bisection  width  problem  is  known  to  be  NP-complete  [13],  there  are  many  heuristics 
which  do  quite  well  in  practice  [6,  7,  10,  18,  37,  40],  Analyzing  these  or  developing  new  heuristics 
along  similar  lines  may  help  solve  the  layout  problem. 

5.  Is  there  a  provably  good,  polynomial  time  algorithm  for  the  crossing  number  problem? 
This  problem  was  recently  shown  to  be  NP-complete  [14],  but  the  possibility  of  approximation 
algorithms  is  not  ruled  out.  The  arguments  of  Section  7  suggest  that  graph  bisection  algorithms 
might  be  effective  for  this  problem. 
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